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Preface 


We wrote this book with the aim of inspiring the reader to explore mathe- 
matics. Our goal is to provide opportunities for students to discover math- 
ematical ideas in the context of applications. Before any formal mathematics, 
the text starts with two main data applications—radiography/tomography of 
images and heat diffusion—to inspire the creation and development of Linear 
Algebra concepts. 

The applications are presented primarily through a sequence of explo- 
rations. Readers first learn about one aspect of a data application, and then, in 
an inquiry framework, they develop the mathematics necessary to investigate 
the application. After each exploration, the reader will see the standard 
definitions and theorems for a first-year Linear Algebra course, but with the 
added context of the applications. 

A novel feature of this approach is that the applied problem inspires the 
mathematical theory, rather than the applied problem being presented after 
the relevant mathematics has been learned. Our goal is for students to 
organically experience the relevance and importance of the abstract ideas of 
linear algebra to real problems. We also want to give students a taste of 
research mathematics. Our explorations ask students to make conjectures and 
answer open-ended questions; we hope they demonstrate for students the 
living process of mathematical discovery. 

Because of the application-inspired nature of the text, we created a path 
through introductory linear algebra material to naturally arise in the process 
of investigating two data applications. This led to a couple key content 
differences from many standard introductory linear algebra texts. First, we 
introduce vector spaces very early on as the appropriate settings for our 
problems. Second, we approach eigenvalue computations from an 
application/computation angle, offering a determinant-free method as well as 
the typical determinant method for calculating eigenvalues. 

Although we have focused on two central applications that inspire the 
development of the linear algebra ideas in this text, there are a wide array of 
other applications and mathematical paths, many of which relate to data 
applications, that can be modeled with linear algebra. We have included “sign 
posts” for these applications and mathematical paths at moments where the 
reader has learned the necessary tools for exploring the application or math- 
ematical path. These applications and mathematical paths are separated into 
three main areas: Data and Image Analysis (including Machine Learning), 
Dynamical Modeling, and Optimization and Optimal Design. 
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Outline of Text 


In Chapter | we outline some of the fundamental ways that linear algebra is 
used in our world. We then introduce, with more depth, the applications of 
radiography/tomography, diffusion welding, and heat warping of images, 
which will frame our discussion of linear algebra concepts throughout the 
book. 

Chapter 2 introduces systems of equations and vector spaces in the context 
of the applications. The chapter begins with an exploration (Section 2.1) of 
image data similar to what would be used for radiographs or the reconstructed 
images of brain slices. Motivated by a question about combining images, 
Section 2.2 outlines methods for solving systems of equations. (For more 
advanced courses, this chapter can be skipped.) In Section 2.3, we formalize 
properties of the set of images (images can be added, multiplied by scalars, 
etc.) and we use these properties to define a vector space. While Section 2.3 
focuses on the vector spaces of images and Euclidean spaces, Section 2.4 
introduces readers to a whole host of new vector spaces. Some of these (like 
polynomial spaces and matrix spaces) are standard, while other examples 
introduce vector spaces that arise in applications, including heat states, 7-bar 
LCD digits, and function spaces (including discretized function spaces). 
We conclude the chapter with a discussion of subspaces (Section 2.5), again 
motivated by the setting of images. 

Chapter 3 delves into the fundamental ideas of linear combinations, span, 
and linear independence, and concludes with the development of bases and 
coordinate representations for vector spaces. Although the chapter does not 
contain any explorations, it is heavily motivated by explorations from the 
previous chapter. Specifically, the goal of determining if an image is an 
arithmetic combination of other images (from Section 2.1) drives the defi- 
nition of linear combinations in Section 3.1, and also adds context to the 
abstract concepts of the span of a set of vectors (Section 3.2) and linear 
independence (Section 3.3). In Sections 3.4 and 3.5, we investigate how 
linearly independent spanning sets (bases) in the familiar spaces of images 
and heat states are useful for defining coordinates on those spaces. This 
allows us to match-up images and heat states with vectors in Euclidean 
spaces of the same dimension. We conclude the chapter with a “sign post” for 
regression analysis. 

Chapter 4 covers linear transformations. In Section 4.1, readers are taken 
through an exploration of the radiographic transformation beginning with a 
definition of the transformation. Next, they use coordinates to represent this 
transformation with a matrix, and in Section 4.2, they investigate transfor- 
mations more generally. In Section 4.3 readers see how the heat diffusion 
operator can be represented as a matrix, and in Section 4.4 they explore more 
generally how to represent arbitrary transformations between vector spaces 
by matrices of real numbers. In Section 4.6, the reader will return to the 
radiographic transformation and explore properties of the transformation, 
considering whether it is possible for two objects to produce the same 
radiograph, and whether there are any radiographs that are not produced by 
any objects. This exploration leads to the definitions of one-to-one and onto 
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linear transformations in Section 4.7. This is also where the critical idea of 
invertibility is introduced; in the radiographic transformation setting, if the 
transformation is invertible then reconstruction (tomography) is possible. 

The goal of Chapter 5 is to understand invertibility so that we can solve 
inverse problems. In Section 5.1 readers consider what would happen if the 
radiographic transformation is not invertible. This leads to a study of sub- 
spaces related to the transformations (nullspace and range space). The section 
concludes with the rank-nullity theorem. In Section 5.2, the corresponding 
ideas for matrix representations of transformations (nullspace, row space, and 
column space) are discussed along with the introduction of the Invertible 
Matrix Theorem. In Section 5.3 the reader will reconstruct brain slice images 
for certain radiographic setups after developing the concept of a left inverse. 
We conclude this chapter with a “sign post” for linear programming. 

Chapter 6 introduces eigenvector and eigenvalue concepts in preparation 
for simplifying iterative processes. Section 6.1 revisits the heat diffusion 
application. In this exploration, readers examine a variety of initial heat 
states, and observe that some heat states have a simple evolution while others 
do not. Combining this with the linearity of the diffusion operator leads to the 
idea of creating bases of these simple heat states. Section 6.2 formalizes the 
ideas of the previous heat diffusion exploration and introduces eigenvectors, 
eigenvalues, eigenspaces, and diagonalization. Using these constructs, in 
Section 6.3 readers, again, address the long-term behavior of various heat 
states, and start to make connections to other applications. We follow the 
application with Section 6.4 where we present many more applications 
described by repeated matrix multiplication or matrix/vector sequences. 
Within this chapter are “sign posts” for Fourier analysis, nonlinear opti- 
mization and optimal design, and for dynamical processes. 

Chapter 7 includes the discussion on how to find suitable solutions to 
inverse problems when invertibility is not an option. In Section 7.1, moti- 
vated by the idea of determining the “degree of linear independence” of a set 
of images, readers will be introduced to the concepts of inner product and 
norm in a vector space. This chapter also develops the theory of orthogo- 
nality. Section 7.2 uses orthogonality to define orthogonal projections in 
Euclidean space along with general projections. The tools built here are then 
used to construct the Gram-Schmidt Process for producing an orthonormal 
basis for a vector space. In Section 7.3, motivated by ideas from earlier 
tomography explorations, we develop orthogonal transformations and related 
properties of symmetric matrices. Section 7.4 is an exploration in which the 
reader will learn about the concepts of maximal isomorphisms and 
pseudo-invertibility. In Section 7.5, readers will combine their knowledge 
about diagonalizable and symmetric transformations and orthogonality to 
more efficiently invert a larger class of radiographic transformations using 
singular value decomposition (SVD). The Final exploration, in Section 7.6, 
makes use of SVD to perform brain reconstructions. Readers will discover 
that SVD works well with clean data, but poorly for noisy data. At the end of 
this section, readers explore ideas for nullspace enhancements to reconstruct 


brain images from noisy data. This final section is set up so that the reader 
can extend their knowledge of Linear Algebra in a grand finale exploration 
reaching into an active area of inverse problem research. Also included 
throughout Chapter 7 are “sign posts” for data analysis tools, including 
support vector machines, clustering, and principle component analysis. 

Finally, Chapter 8 wraps up the text by describing the exploratory nature 
of applied mathematics and encourages the reader to continue using similar 
techniques on other problems. 


Using This Text 


The text is designed around a semester-long course. For a first course in 
Linear Algebra, we suggest including Chapters 1-6 with selected topics from 
Chapter 7 as time allows. Although the heat diffusion application is not fully 
resolved until Section 6.3 and the tomography application is not fully 
resolved until Section 7.6, one could reasonably conclude a 1-semester 
course after Section 6.2. At that point, some (relatively elementary) brain 
images have been reconstructed from radiographic data, a good exploration 
of Heat Diffusion has completed a study of Eigenvectors and Diagonaliza- 
tion, and tomography has motivated ideas that will lead to inner product, 
vector norm, projection, and the Gram-Schmidt Process. An outline from an 
example of our introductory courses is included on page xi. 

Chapter 8 can be a great source of ideas for student projects. We 
encourage anyone using this text to consider applications discussed there. 

This text has also been used for a more advanced second course in Linear 
Algebra. In this scenario, the instructor can move more rapidly through the 
first three chapters highlighting connections with the applications. The course 
could omit Sections 5.3 and 6.3 in order to have adequate time to complete 
the tomographic explorations in Chapter 7. Such a course could additionally 
include the derivation of the diffusion equation in Appendix B and/or a 
deeper understanding of radiographic transformations described in 
Appendix A. 


Exercises 


As mathematics is a subject best learned by doing, we have included exer- 
cises of a variety of types at many levels: concrete practice/computational, 
theoretical/proof-based, application-based, application-inspired/inquiry, and 
open-ended discussion exercises. 


Computational Tools 
One of the powerful aspects of Linear Algebra is its ability to solve 


large-scale problems arising in data analysis. We have designed our explo- 
rations to highlight this aspect of Linear Algebra. Many explorations include 
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Chapter Sections Title # of (50-min) Classes 
Ch 1 Introduction to Applications 1 Class 
Ch 2 Vector Spaces 
§2.1 Exploration: Digital Images 1 Class 
§2.2 Systems of Equations 2 Classes 
§2.3 Vector Spaces 1 Class 
§2.4 Vector Space Examples 1 Classes 
§2.5 Subspaces 2 Classes 
Ch 3 Vector Space Arithmetic and Representations 
§3.1 Linear Combinations 2 Classes 
§3.2 Span 2 Classes 
§3.3 Linear Independence 2 Classes 
§3.4 Bases 2 Classes 
§3.5 Coordinates 1 Class 
Ch 4 Linear Transformations 
84.1 Exploration: Computing Radiographs 2 Classes 
§4.2 Linear Transformations 2 Classes 
84.3 Exploration: Heat Diffusion 1 Class 
84.4 Matrix Representations of Linear Transformations 
84.5 The Determinant of a Matrix 1 Class 
84.6 Exploration: Tomography 1 Class 
§4.7 Transformation Properties (1-1 and Onto) 3 Classes 
Ch 5 Invertibility 
§5.1 Transformation Spaces 2 Classes 
§5.2 The Invertible Matrix Theorem 1 Class 
§5.3 Exploration: Tomography Without an Inverse 1 Class 
Ch 6 Diagonalization 
86.1 Exploration: Heat State Evolution 1 Class 
§6.2 Eigenspaces 3 Classes 
§6.3 Exploration: Diffusion Welding 1 Class 
86.4 Markov Processes 1 Class 
Ch 7 Inner Product Spaces 
§7.1 Inner Products 1 Class 
§7.2 Projections 1 Class 


code for students to run in either MaTLas or the free, open-source software 
Octave. In most cases, the code can be run in online programming envi- 
ronments, eliminating the need for students to install software on their own 
computers. 


Ancillary Materials 


Readers using this text are invited to visit our website (www.imagemath.org) 
to access data sets and code for explorations. Instructors are able to create an 
account at our website so that they can download ancillary materials. 
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Materials available to instructors include all code and data sets for the 
explorations and instructor notes and expected solution paths for the 
explorations. 


Pullman, USA Heather A. Moon 
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Introduction To Applications 


Welcome to your first course in linear algebra—arguably the most useful mathematical subject you 
will encounter. Not only are the skills important for solving linear problems, they are a foundation for 
many advanced topics in mathematics, physics, engineering, computer science, economics, business, 
and more. 

This book focuses on the investigation of specific questions from real-world applications. Your 
explorations will inspire the development of necessary linear algebra tools. In other words, you will 
be given interesting open problems and as you create your own solutions, you will also be discovering 
and developing linear algebra tools. This is similar to the way that original investigators recognized 
key concepts in linear algebra and which led to the standardization of linear algebra as a branch of 
mathematics. Rather than introduce linear algebra from a vacuum, we will create linear algebra as the 
skill set necessary to solve our applications. Along the way we hope that you will discover that linear 
algebra is, in itself, an exciting and beautiful branch of mathematics. This book aims to integrate the 
exploratory nature of open-ended application questions with the theoretical rigor of linear algebra as 
a mathematical subject. 


1.1 A Sample of Linear Algebra in Our World 


We began this text with a very bold claim. We stated that linear algebra is the most useful subject 
you will encounter. In this section, we show how linear algebra provides a whole host of important 
tools that can be applied to exciting real-world problems and as well as to advance your mathematical 
skills. We also discuss just a few interesting areas of active research for which linear algebra tools are 
essential. Each of these problems can be expressed using the language of linear algebra. 


1.1.1 Modeling Dynamical Processes 


The word “dynamic” is just a fancy way of describing something that is changing. All around us, 
from microscopic to macroscopic scales, things are changing. Scientists measure the spread of disease 
and how to mitigate such spread. Biologists use population models to show how population dynamics 
change when new species are introduced to an ecosystem. Astronomers use orbital mechanics to study 
the interactions of celestial bodies in our universe. Manufacturers are interested in the cooling process 
after diffusion welding is complete. Chemists and nuclear physicists study atomic and subatomic 
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interactions. And, geologic and atmospheric scientists study models that predict the effects of geologic 
or atmospheric events. These are only a small subset of the larger set of dynamic models being studied 
today. 

Differential equations provide the mathematical interpretation of dynamic scenarios. Solutions to the 
differential equations provide scientists deeper understanding of their respective problem. Solutions 
help scientists determine the best orbit for space telescopes like Hubble or the James Webb Space 
telescope, determine how different mitigation strategies might affect disease spread, and predict the 
location of a tsunami as it spreads across the Pacific Ocean. 


1.1.2 Signals and Data Analysis 


In our world, data is collected on just about everything. Researchers are interested in determining ways 
to glean information from this data. They might want to find patterns in the data that are unique to a 
particular scenario. How often does the apparent luminosity of a distant star fall because of the transit of 
an orbiting planet? When is the best time to advertise your product to users on a social media platform? 
How do we determine climate change patterns? Researchers are also interested in how to determine 
what the data tells them about grouping like subjects in their study. Which advertisement type matches 
with which social media behavior? How can we classify animal fossils to link extinct animals to the 
animals we see around us? Scientists are interested in determining the division between unlike subjects. 
Will a particular demographic vote for a particular candidate based on key features in an ad campaign? 
They might want to predict outcomes based on data already collected. What can downstream flow and 
turbidity data reveal about forest health and snow pack in the upstream mountains? Image analysts are 
interested in recovering images, determining image similarity, and manipulating images and videos. 
How can we recover the 3D structure of an object from many 2D pictures? How can we view brain 
features from CAT scans using fewer scans? How do we create smooth transitions between images in 
a video? 

Machine learning is an active area of data analysis research. The overarching goal of machine 
learning is to create mathematical tools that instruct computers on classification of research subjects 
or how to predict outcomes. Some key tools of machine learning are Fourier analysis (for pattern 
recognition), regression analysis (for predictions), support vector machines (for classification), and 
principle component analysis (for finding relevant features or structure). 


1.1.3 Optimal Design and Decision-Making 


Optimization is the mathematical process of making optimal decisions. Every day we are confronted 
with an endless stream of decisions which we must make. What should I eat for breakfast? Should 
I check my email? What music will I listen to? Pet the cat or not? Most decisions are made easily 
and quickly without much attention to making a “best” choice and without significantly affecting the 
world around us. However, researchers, business owners, engineers, and data analysts cannot often 
ignore best choices. Mathematical optimization seeks a best choice when such a choice is crucial. For 
example, an engineer designing a tail fin for a new aircraft cannot produce a design based on a personal 
desire for it to “look cool.” Rather, the relevant criteria should include performance, structural integrity, 
cost, manufacturing capability, etc. 

Optimization is ubiquitous in the sciences and business. It is the tool for engineering design, multi- 
period production planning, personnel and/or event scheduling, route-planning for visits or deliveries, 
linear and nonlinear data fitting, manufacturing process design, packaging, designing facility layouts, 
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partitioning data into classes, bid selection (tendering), activities planning, division into teams, space 
probe trajectory planning, highway improvements for traffic flow, allocation of limited resources, 
security camera placement, and many others. 


1.2 Applications We Use to Build Linear Algebra Tools 


This book will consider a variety of pedagogical examples and applications. However, you will 
primarily find two real-world application tasks woven throughout the book, inspiring and motivating 
much of the material. Therefore, we begin our path into linear algebra with an introduction to these 
two questions. 


1.2.1 CAT Scans 


Consider the common scenario of computerized axial tomography (CAT) scans of a human head. The 
CAT scan machine does not have a magic window into the body. Instead, it relies on sophisticated 
mathematical algorithms in order to interpret X-ray data. The X-ray machine produces a series (perhaps 
hundreds) of radiographs such as those shown in Figure 1.1, where each image is taken at a different 
orientation. Such a set of 2D images, while visually interesting and suggestive of many head features, 
does not directly provide a 3D view of the head. A 3D view of the head could be represented as a set 
of head slices (see Figure 1.2), which when stacked in layers provide a full 3D description. 


Task #1: Produce a 3-dimensional image of a human head from a set of 2-dimensional radiographs. 


Each radiograph, and each head slice, is shown as a grayscale image with a fixed number of pixels. 
Grayscale values in radiographs are proportional to the intensity of X-rays which reach the radiographic 
film at each location. Grayscale values in the head images are proportional to the mass at each location. 
The radiographic process is quite complex, but with several reasonable assumptions can be modeled 


Fig. 1.1 Three examples of radiographs of a human head taken at different orientations. 


Fig. 1.2 Four examples of density maps of slices of a human head. 
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Fig.1.3 A diffusion welding apparatus. Picture Credit: AWS Welding Handbook. 


as a process described well within a linear algebra context. In fact, we used linear algebra concepts 
discussed later in this text to produce the images of brain slices seen in Figure 1.2 from surprisingly 
few radiographs like those in Figure 1.1. As you can see, the details in the slices are quite remarkable! 
The mathematical derivation and brief explanations of the physical process are given in Appendix A. 


1.2.2 Diffusion Welding 


Consider the following scenario involving diffusion welding. Like other types of welding, the goal is 
to adjoin two or more pieces together. Diffusion welding is used when it is important not to have a 
visible joint in the final product while not sacrificing strength. A large apparatus like the one found in 
Figure |.3 is used for this process. As you can see, a company is unlikely to have many of these on 
hand. In our application, we will consider welding together several small rods to make one longer rod. 
The pieces will enter the machine in place. Pressure is applied at the long rod ends as indicated at the 
bottom of Figure 1.4. The red arrows show where heat is applied at each of the joints. The temperature 
at each joint is raised to a significant fraction of the melting temperature of the rod material. At these 
temperatures, material can easily diffuse through the joints. At the end of the diffusion process, the 
rod temperature is as indicated in the plot and color bar in Figure 1.4. The temperature is measured 
relative to the temperature at the rod ends, which is fixed at a cool temperature. After the rod cools 
there are no indications (macroscopically or microscopically) of joint locations. 


Task #2: Determine the earliest time at which a diffusion-welded rod can be safely removed from 
the apparatus, without continuous temperature monitoring. 


The rod can be safely removed from the apparatus when the stress due to temperature gradients 
is sufficiently small. The technician can determine a suitable thermal test. It is in the interest of the 
manufacturer to remove the rod as soon as possible, but not to continuously monitor the temperature 
profile. In later chapters, we discuss the linear algebra techniques that will allow us (and you) to 
determine the best removal time. 
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Fig.1.4 Bottom: Diffusion welding to make a rod with four joints. Middle: Color bar indicating temperature along the 
rod. Top: Heat difference profile corresponding to temperature difference from the ends of the rod. 


1.2.3 Image Warping 


Consider the scenario of image warping, where you have two images, and you want to create a video 
or a sequence of images that flows smoothly from one image to the other. In our application, we will 
consider techniques similar to those used in the diffusion welding application. This application extends 
the techniques to 2 dimensions. Linear algebra techniques discussed in later chapters will allow us to 
transition between two or more images as depicted in Figure 1.5. 


1.3 Advice to Students 


The successful mastery of higher-level mathematical topics is something quite different from memorizing 
arithmetic rules and procedures. And, it is quite different from proficiency in calculating numerical 
answers. Linear algebra may be your first mathematics course that asks you for a deeper understanding 
of the nature of skills you are learning, knowing “why” as well as “how” you accomplish tasks. This 
transition can be challenging, yet the analytic abilities that you will acquire will be invaluable in future 
mathematics courses and elsewhere. We (the authors) offer the following guidelines for success that 
we feel are worth your serious consideration. 


b> Keys to Success in Linear Algebra 


e You must be willing to explore, conjecture, guess. You must be willing to fail. Explorations are 
key to building your understanding. 

e You will need to practice learned concepts through exercises, written descriptions, and verbal 
discussion. Always explain your reasoning. 
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i 5 Ten images in the warping sequence that begins with a young boy on a tractor and ends with the same boy a 
little older. 


e You must be willing to ask questions and participate in classroom activities. Never be satisfied 
with a yes/no response. 

e Whether you approach mathematics from a computational, theoretical, or applied view, you 
must be willing to accept that Linear Algebra is most vibrant when it incorporates each of these 
aspects. 

e You will need to be willing to read actively and with understanding, working through the examples 
and questions posed in the text. 


1.4 The Language of Linear Algebra 


In this text, you will learn many new definitions and terminology. In order to fully understand the 
topics in this text, effectively communicate your understanding, and to have effective conversations 
with other professionals, you will need to become fluent in the language of linear algebra. In order to 
help you recognize appropriate uses of this language, we added “Watch Your Language!” indicators 
such as the one below to exemplify this language. 


x Watch Your Language! When communicating about equations and/or expressions, it is important to use 
the proper language surrounding each. 


Vv We solve equations. 
v We simplify expressions. 
We do not say 


X We solve an expression. 
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1.5 Rules of the Game 


In this text, we will pose questions for which there isn’t always one correct answer. Sometimes answers 
do not exist, proofs can’t be had, and statements are just not true or are open to interpretation. We do this, 
not to cause frustration, but rather to give you an opportunity to exercise your creative mathematical 
talent, learn to explore possible strategies to recognize truths (and untruths), and to strengthen your 
understanding. However, we do abide by the following list: 


> Rules of the Game 


e If we ask you to prove something, you can assume the statement is true. On the other hand, we 
may ask you whether or not a statement is true. In this case, we will expect you to justify your 
assertion. 

e Ifweask you to find something, you may or may not be able to find it. We want you to experience 
the satisfaction of determining the existence as if you were researching a new problem. 

e If we ask you to compute something, you can be sure that the task is possible. 


1.6 Software Tools 


There are a variety of excellent software tools for exploring and solving linear algebra tasks. In this text, 
we highlight essential MATLAB/OCTAVE commands and functions to accomplish key computational 
and graphical needs. All of the exploratory tasks are written with these software tools in mind. Several 
of these tasks make use of application data files which are collected in MATLAB-compatible format. 
The code and data files can be found on the IMAGEMATH website at http://imagemath.org/AILA.html. 


1.7. Exercises 


1. Which of the two application tasks discussed in this chapter do you feel will be the most challenging 
and why? 

2. In analyzing a CAT scan, how many radiographs do you feel would be sufficient to obtain accurate 
results? How many do you believe would be too many? 

3. In the tomography application, do you believe that it is possible that a feature in an object will not 
show up in the set of radiographs? 

4. Consider the tomography application. Suppose the data-gathering instrument returns radiographs 
with missing data for some pixels. Do you believe that an accurate 3D head reconstruction might 
still be obtained? 

5. Consider the diffusion welding application. For what purpose are the ends of the rod held at constant 
temperature during the cooling process? 

6. Consider the diffusion welding application. Do you believe it is possible for a rod to have a 
temperature profile so that at some location the temperature does not simply decrease or simply 
increase throughout the diffusion process? 

7. Consider the diffusion welding application. If at some location along the rod the initial temperature 
is the same as the temperature at the ends, do you believe the temperature at that location will remain 
the same throughout the diffusion process? 
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Vector Spaces 


In this chapter, we will begin exploring the Radiography/Tomography example discussed in 
Section 1.2.1. Recall that our goal is to produce images of slices of the brain similar to those found in 
Figure 2.1 from radiographic data such as that seen in Figure 2.2. 


Fig. 2.1 Four examples of density maps of slices of a human head. 


In order to meet this goal, we will begin exploring images as mathematical objects. We will discuss 
how we can perform arithmetic operations on images. Finally, we will explore properties of the set of 
images. With this exploration, we will begin recognizing many sets with similar properties. Sets with 
these properties will be the sets on which we focus our study of linear algebra concepts. 


2.1 Exploration: Digital Images 


In order to understand and solve our tomography task (Section 1.2.1), we must first understand the 
nature of the radiographs that comprise our data. Each radiograph is actually a digitally stored collection 
of numerical values. It is convenient for us when they are displayed in a pixel arrangement with colors 
or grayscale. This section explores the nature of pixelized images and provides exercises and questions 
to help us understand their place in a linear algebra context. 

We begin by formalizing the concept of an image with a definition. We will then consider the most 
familiar examples of images in this section. In subsequent sections we will revisit this definition and 
explore other examples. 


Definition 2.1.1 


An image is a finite ordered list of real numbers with an associated geometric arrangement. 
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Fig.2.2 Three examples of radiographs of a human head taken at different orientations. 


Fig. 2.3 Digital images are composed of pixels, each with an associated brightness indicated by a numerical value. 
Photo Credit: Sharon McGrew. 


First, let us look at an image from a camera in grayscale. In Figure 2.3, we see one of the authors 
learning to sail. When we zoom in on a small patch, we see squares of uniform color. These are the 
pixels in the image. Each square (or pixel) has an associated intensity or brightness. Intensities are 
given a corresponding numerical value for storage in computer or camera memory. Brighter pixels are 
assigned larger numerical values. 

Consider the 4 x 4 grayscale image in Figure 2.4. This image corresponds to the array of numbers at 
right, where a black pixel corresponds to intensity 0 and increasingly lighter shades of gray correspond 
to increasing intensity values. A white pixel (not shown) corresponds to an intensity of 16. 
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Fig.2.4 Pixel intensity assignment. 


Image A Image B Image C 


Fig.2.5 Three 4 x 4 grayscale images. The pixel intensities in these particular images are all either 0 or 8. 


A given image can be displayed on different scales; in Figure 2.3, a pixel value of 0 is displayed as 
black and 255 is displayed as white, while in Figure 2.4 a pixel value of 0 is displayed as black and 16 
is displayed as white. The display scale does not change the underlying pixel values of the image. 

Also, the same object may produce different images when imaged with different recording devices, 
or even when imaged using the same recording device with different calibrations. For example, this is 
what a smart phone is doing when you touch a portion of the screen to adjust the brightness when you 
take a picture with it. 

Our definition of an image yields a natural way to think about arithmetic operations on images such 
as multiplication by a scalar and adding two images. For example, suppose we start with the three 
images A, B, and C in Figure 2.5. 

Multiplying Image A by one half results in Image 1 in Figure2.6. Every intensity value is now 
half what it previously was, so all pixels have become darker gray (representing their lower intensity). 
Adding Image | to Image C results in Image 2 (also in Figure2.6); so Image 2 is created by doing 
arithmetic on Images A and C. 

Caution: Digital images and matrices are both arrays of numbers. However, not all digital images 
have rectangular geometric configurations like matrices! , and even digital images with rectangular 
configurations are not matrices, since there are operations~ that can be performed with matrices that 
do not make sense for digital images. 


2.1.1. Exercises 


For some of these exercises you will need access to OCTAVE or MATLAB software. The following 
exercises refer to images found in Figures 2.5 and 2.6. 


' See Page 39 for examples of non-rectangular geometric configurations. 
? An example of an operation on matrices that is meaningless on images is row reduction. 
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Image 1 Image 2 
Image 3 Image 4 


Fig. 2.6 Four additional images. Image 1 is (0.5)(Image A) and Image 2 is (Image 1)+(Image C). 


1. Express Image 3 using arithmetic operations on Images A, B, and C. (Note that the pixel intensities 
in Image 3 are all either 4, 8, or 16.) 

2. Express Image 4 using arithmetic operations on Images A, B, and C. (Note that the pixel intensities 
in Image 4 are all either 0 or 16.) 

3. Input the following lines of code into the command window of OCTAVE/MATLAB. Note that 
ending a line with a semicolon suppresses terminal output. If you want to show the result of a 
computation, delete the semicolon at the end of its line. Briefly describe what each of these lines 
of code produces. 


MA = [ 
MB = [ 
[ 


M_C 


figure(’position’, [0,0,1200,360]); 
GrayRange=[0 16]; 


subplot(1,3,1); 
imagesc (M_A,GrayRange) ; 
title(’Image A’); 


subplot(1,3,2); 
imagesc (M_B, GrayRange) ; 
title(’Image B’); 


subplot (1,3,3); 
imagesc (M_C,GrayRange) ; 
title(’Image C’); 


colormap (gray (256) ); 


2.1 
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4. Enter the following lines of code one at a time and state what each does. 


5 


6. 


MA 

M1= .5*MA 

M22=Mi1+MC 

figure(’position’, [0,0,1200,360]); 
GrayRange=[0 16]; 

subplot(1,2,1); 
imagesc (M_1,GrayRange) ; 
title(’Image 1’); 
subplot (1,2,2); 

imagesc (M_2,GrayRange) ; 
title(’Image 2’); 
colormap (gray (256) ); 


. Write your own lines of code to check your conjectures for producing Images 3 and/or 4. How 


close are these to Images 3 and/or 4? 

We often consider display scales that assign pixels with value 0 to the color black. If a recording 
device uses such a scale then we do not expect any images it produces to contain pixels with 
negative values. However, in our definition of an image we do not restrict the pixel values. In this 
problem you will explore how OCTAVE/MATLAB displays an image with negative pixel values, 
and you will explore the effects of different gray scale ranges on an image. 

Input the image pictured below into OCTAVE/MATLAB. Then display the image using each of the 
following five grayscale ranges. 


—10] 0 | 20 
10 | 5 |—20 
(i) GrayRange= [0, 20], 
(ii) GrayRange= [0, 50], 
(iii) GrayRange= [—20, 20], 
(iv) GrayRange= [—10, 10], 
(v) GrayRange= [—50, 50]. 


(a) Describe the differences in the display between: setting (1) and setting (ii); setting (i) and setting 


(iii); setting (iii) and setting (iv); and finally between setting (iii) and setting (v). 


(b) Summarize what happens when pixel intensities in an image exceed the display range as input 


into the imagesc function. 


(c) Summarize what happens when the display range becomes much larger than the range of pixel 


values in an image. 


(d) Discuss how the pixels with negative values were displayed in the various gray scale ranges. 


. How should we interpret pixel intensities that lie outside our specified grayscale range? 
. Consider digital radiographic images (see Figure 1.1). How would you interpret intensity values? 


How would you interpret scalar multiplication? 


. What algebraic properties does the set of all images have in common with the set of real numbers? 
. Research how color digital images are stored as numerical values. How can we modify our concepts 


of image addition and scalar multiplication to apply to color images? 
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11. Describe how a heat state on a rod can be represented as a digital image. 
12. Think of two different digital image examples for which negative pixel intensities make real-world 
sense. 


2.2 Systems of Equations 


In Section 2.1, we considered various 4 x 4 images (see page 11). We showed that Image 2 could be 
formed by performing image addition and scalar multiplication on Images A, B, and C. In particular, 


(Image 2) = (5) (Image A) + (0)(Image B) + (1)(Image C). 


We also posed the question about whether or not Images 3 and 4 can be formed using any arithmetic 
operations of Images A, B, and C. One can definitely determine, by inspection, the answer to these 
questions. Sometimes, however, trying to answer such questions by inspection can be a very tedious 
task. In this section, we introduce tools that can be used to answer such questions. In particular, we 
will discuss the method of elimination, used for solving systems of linear equations. We will also use 
matrix reduction on an augmented matrix to solve the corresponding system of equations. We will 
conclude the section with a key connection between the number of solutions to a system of equations 
and a reduced form of the augmented matrix. 


2.2.1 Systems of Equations 


In this section we return to one of the tasks from Section 2.1. In that task, we were asked to determine 
whether a particular image could be expressed using arithmetic operations on Images A, B, and C. Let 
us consider a similar question. Suppose we are given the images in Figures 2.7 and 2.8. Can Image C 
be expressed using arithmetic operations on Images A, D, and F? 

For this question, we are asking whether we can find real numbers a, 3, and ¥ so that 


Image C Image A Image D Image F (2.1) 


Image A Image B Image C 


Fig.2.7 Images A, B, and C from Section 2.1. 
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Image D Image E Image F 


Fig.2.8 Images D, E, and F are more example 16-pixel images. 


First, in order to make sense of this question, we need to define what it means for images to be equal. 


Definition 2.2.1 


Let J; and J) be images. We say that J; = I2 if each pair of corresponding pixels from J; and Ip 
has the same intensity. 


The convention in Figure 2.4, Definition 2.2.1, and Equation 2.1 give us a means to write an equation, 
corresponding to the upper left pixel of Image D, 


8 = 0a +46 + 87. (2.2) 


This equation has a very specific form: it is a linear equation. Such equations are at the heart of the 
study of linear algebra, so we recall the definition below. 


Definition 2.2.2 


A linear equation is an equation of the form 


a,x {+agx2+ +--+ + anXn = b, 


where b € R, a),--+ , da, € R are called coefficients and x), x2, ..., x, are called variables. 


In the definition above, we use the symbol “ce” and mean “element of” or “in the set.” We write, 
above, that b € R. This means that b is an element of the set of all real numbers. Typically, we read 
this as “b is real.” We will use this notation throughout the text for many different sets. 

This definition considers coefficients which are real numbers. Later, we will encounter some 
generalizations where coefficients are elements of other fields. See Appendix D for a discussion of 
fields. Let us consider some examples of linear equations. 


Example 2.2.3 The equation 
3x+4y—-—2z=w 


is linear with variables x, y, z, and w, and with coefficients 3, 4, —2, and —1 we see this by putting 
the equation in the form given in Definition 2.2.2. In this form, we have 


3x + 4y —2z-w=0. 
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The form given in Definition 2.2.2 will often be referred to as standard form. We now follow with 
some equations that are not linear. 


Example 2.2.4 The following equations are not linear: 


3x + 2yz =3 
x7 +4x =2 
cosx — 2y = 3. 


AS we can see, in each equation above, operations applied on the variables that are more complicated 
than addition and multiplication by a constant coefficient. 


Let us, now, return to the question at hand. In Equation 2.2, the variables are a, 3, and y. We seek 
values of these variables so that the equation is true, that is, so that both sides are equal. Appropriate 
values for a, 3, and y constitute a solution to Equation 2.2, which we define below. 


Definition 2.2.5 
Let 

QyX1 + 2x2 +--+ + anXn =b 
be a linear equation in 7 variables, x1, x2, +--+ , Xn. Then 


(v1, V2, -++ , Un) € R" 
is a solution to the linear equation if 


avy +4202 +--+ + an¥y, = dD. 


To understand Definition 2.2.5, we recall that when we solve an equation, we can check our solution 
by substituting it into the original equation to determine whether the equation is true. The definition 
tells us that if we do this substitution and obtain a true statement, then the ordered n-tuple of the 
substituted values is a solution. For example, (3, 0, 1) is a solution to Equation 2.2 because 


8=0-3+4-04+8-1. 
Notice, also, that (1, 1, 1) is not a solution to Equation 2.2: 
0-14+4-14+8-1=12 
is false because 8 A 12. 
In order for Equation 2.1 to be true, we need a, 3, and ¥ so that all linear equations (corresponding 


to all pixel locations) are true. Let us write out these 16 linear equations (starting in the upper left 
corner and going across the top before moving to the second row of pixels). 
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-a+4-68+8.- 
-a+0-68+8.- 
-a+4-8+0- 
-a+8-G+8.- 
-at0-6+8.- 
-at+4-68+8.- 
-a+8-8+16-¥ (2.3) 
-a+4-8+0-¥7 

-at+4-8+0-¥7 

-a+8-8+16-¥ 

-a+4-8+8- 
-a+0-684+8.- 
-a+8-G+8.- 
-a+4-8+0- 
-at0-6+8.- 
-at+4-68+8.- 


~mooenrn7ownnododon niondac w 
2 2 RZ R R R 
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Because these equations are all related by their solution (a, 3, 7), we call this list of equations a system 
of equations as defined below. 


Definition 2.2.6 
Let m,n € N. A set of m linear equations, each with n variables x1, x2, ..., Xn, is called a system 


of equations with m equations and n variables. 


We say that (v1, v2,..., U,) € R” is a solution to the system of equations if it is simultaneously 
a solution to all equations in the system. 


Finally, the set of all solutions to a system of equations is the solution set of the system of 
equations. 


Before continuing toward a solution to the system of equations in (2.3), we should observe that 
several equations are repeated. Therefore, thankfully, we need only write the system without repeated 
equations. 


8=0-a+4-8+4+8.-¥7 
0=0-a+0-64+8-y¥7 
8-at+4-B+0-¥7 (2.4) 
8-a+8-64+8-¥ 
8-a+8-64+16-¥ 


oo © 
lll 


Though we were able to use symmetry to simplify this particular system of equations, you can, likely, 
imagine that most important applications using systems of equations require very large numbers 
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(quite possibly thousands or even millions) of equations and variables*. In these cases, we use 
computers to find solutions. Even though technology is used for solving such problems, we find value 
in understanding and making connections between these techniques and properties of the solutions. 
So, in the next section we will describe by-hand methods for solving systems of equations. 


2.2.2 Techniques for Solving Systems of Linear Equations 


In this section, we will describe two techniques for solving systems of equations. We use these two 
techniques to solve systems like the one presented in the previous section that arose from a question 
about images. 


Method of elimination 

In this section, we solve the system of equations in 2.4 using the method of elimination. You may have 
used this method before, but we include it here to introduce some terminology to which we will refer 
in later sections. We will also give a parallel method later in this section. 


Definition 2.2.7 


Two systems of equations are said to be equivalent if they have the same solution set. 


The idea behind the method of elimination is that we seek to manipulate the equations in a system 
so that the solution is easier to obtain. Specifically, in the new system, one or more of the equations 
will be of the form x; = c. Since one of the equations tells us directly the value of one of the variables, 
we can substitute that value into the other equations and the remaining, smaller system has the same 
solution (together, of course, with x; = c). 

Before we solve the system in (2.4), we provide the list of allowed operations for solving a system 
of equations, using the method of elimination. 


> Allowed operations for solving systems of equations 


(1) Multiply both sides of an equation by a nonzero number. 
(2) Change one equation by adding a nonzero multiple of another equation to it. 
(3) Change the order of equations. 


You may find these operations familiar from your earlier experience solving systems of equations; 
they do not change the solution set of a system. In other words, every time we change a system using 
one of these operations, we obtain an equivalent system of equations. 


Fact 2.2.8 
Two systems of equations that differ by one or more operations allowed in the method of 
elimination (as outlined in the list above) are equivalent. 


Let us, now, use these operations to solve our example system of equations. 


3 In practice, we typically use m to represent the number of equations and n to represent the number of variables. 
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Example 2.2.9 We want to solve the system of equations in (2.4). We will use annotations to describe 
which of the allowed operations were used. The convention with our annotations will be that Ex 
represents the new equation k and e, represents the previous equation k. For example, F3 = 2e3 + e1 
means that we will multiply the third equation by 2 and then add the first equation. The result will then 
replace the third equation. 


8=0-a4+4-84+8-¥ 8= 4G + 8y 

0=0-a+0-84+8-y7 . ..0= 8y 
f 

C= guikas hee > Outed 

8=8-a+8-848-¥ 8 = 8a + 86 + 8y 

8=8-a+8-84+ 16-7 8 = 8a + 86 + 167 
2= B+ 27 

) 


1 1 
E\= 7e1, Ex=se 
—> 


: *0=2a+ 8 
ee ar l=a +6+%7 
sat B85 pig 4 6424 


2= Bay 
Bytertes_ 9 Sopa 
Es5s=—eq+es rane B a + 
0= ay 

2= B+ 24 

Exsest2eo =) _ _ 8 * 
l=a+6t+y7 

0= Y. 


Notice that the second and fifth equations tell us that y = 0 and the third equation says that @ = 2. 
We can use substitution in the fourth equation to find a as follows: 


1 =a+2+4+0 
=>-l=a. 


There are no other possible choices of a, 3, and ¥ that satisfy our system. Hence, the solution set of 
the system of equations (2.4) is {(—1, 2, 0)}. You can check that, as Fact 2.2.8 asserts, this solution is 
a solution to each of the systems in each step of the elimination performed above. 


We have, now, answered our original question: Image C can be written as 


Image C Image A Image D Image F 
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Method of matrix reduction 
In this section, we first describe how to represent a system of equations using matrix notation, and then 
we give a systematic algorithm for reducing the matrix until the solution set is apparent. 

As with the previous method, we will use a model example from our image exploration to introduce 
the process. Can we represent Image A using arithmetic operations on Images D, E, and F? That is, 
we want to find a, 3, and y so that 


Image A Image D Image E Image F (2.5) 


As before, we will write a system of equations by matching up corresponding pixels in Equation 2.5, 
leaving out repeated equations. The system is shown below: 


4a+ 06 + 8y =0 
Oa+ 46 + 8y =0 
4a+ 86 + 07 =8 (2.6) 
8a+ 1264+ 8y = 8 
8a + 1284 loy = 8 


To use matrix notation, we store the coefficients and the constants of the equations in an augmented 
matrix as follows: 


40 8|0 
04 8|o 
48 0|8 (2.7) 
812 8|8 
8 12 16|8 


Notice that the entries to the left of the vertical line (or augment) in the matrix are the coefficients of 
the variables in the linear equations. Notice also that the entries to the right of the augment are the 
right-hand sides (constant coefficients) in the equations. 

In general, the system of equations with m equations in n variables 


ayixX1 + aox2 +... + Atntn = dy 
ao1xX1 + a22%2 +... + arn%n = ba 


Gm 1X1 + Am,2X2 +... + Am nXn = bm, 
has a corresponding augmented matrix with m rows and n + | columns, and is written as 


a1) 412... Ain| bi 
a2,1 42,2... a2,.n| b2 


Am.) Am,2 +++» Ann|Om 
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We will reduce the augmented matrix corresponding to the system in (2.6) using a set of allowed 
row operations. As you will see, the key fact (presented below) about row operations on an augmented 
matrix is that they do not change the solution set of the corresponding system. 

Matrix reduction is nearly the same as the elimination method for systems of equations. In fact, the 
differences between the two methods are that, instead of working with a system that has equations, 
you work with a matrix that has rows. Here, we write the allowed steps to standardize this process. 


> Row operations to reduce a matrix The following are operations allowed to be used when reducing 
a matrix. 


(1) Multiply a row by a nonzero number. 
(2) Change one row by adding a nonzero multiple of another row to it. 
(3) Change the order of rows. 


When solving systems of equations using elimination, we discussed equivalent systems. We have a 
similar definition for matrices. The following definition gives us terminology for equivalence between 
matrices that should not be confused with equality. 


Definition 2.2.10 


Let P and Q be twom x (n + 1) augmented matrices, corresponding to systems with m equations 
inn variables. If Q can be obtained by performing a finite sequence of row operations on P, then 
P is row equivalent to Q, which we denote 


P~Q. 


In Definition 2.2.10, we specify that for two matrices to be row equivalent, we require that the 
matrix reduction process begins at P and terminates at Q after only finitely many row operations. It is 
important to recognize that row operations can be performed on non-augmented matrices. So, although 
we then don’t have the interpretation as corresponding to a system of equations, we can still make 
sense of row equivalent non-augmented matrices. 

In Exercise 33, you will show that row operations can be reversed, that is, if P ~ Q, then Q ~ P. 
Hence, rather than writing “P is row equivalent to Q,” it makes sense to write “P and Q are row 
equivalent.” 

Now we can restate Fact 2.2.8 in terms of augmented matrices. 


Fact 2.2.11 
Two augmented matrices that are row equivalent correspond to equivalent systems. 


Clearly, any given matrix (augmented or not) is row equivalent to many other matrices. In order 
to determine whether two matrices are row equivalent, we need to be able to use row operations to 
transform one matrix into the other. Alternatively, we could use row operations to transform both 
matrices into the same third matrix. 
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Lemma 2.2.12 
Let P, Q, and R be three m x n matrices and suppose that P ~ R and Q ~ R. Then P ~ Q. 


The proof is Exercise 34. 

We now return to pursuing our overall goal of finding the solution set of a given linear system. We 
will use row operations to get the corresponding augmented matrix into a format for which we can 
readily find the solution set; this solution set is also the solution set for the original system. 

What simple format might we require for the row equivalent matrix? There are many possible forms 
we might choose, but we will define and focus on two in this text: echelon form and reduced echelon 
form. Both forms include conditions on the leading entries of the matrix. 


Definition 2.2.13 


The leading entries in a matrix are the first nonzero entries in each row, when reading from left to 
right. 


For example, the leading entries in the matrix P below are boxed. 


v 

ll 
oO;NI oO 
oOo O/|F 
orWN 
melon & 
eS NO 


Definition 2.2.14 
A matrix is in echelon form if the following three statements are true: 


eAll leading entries are 1. 
eEach leading entry is to the right of the leading entry in the row above it. 
eAny row of zeros is below all rows that are not all zero. 


The matrix P, above, is not in echelon form. Two of the three criteria above are not true. First, 2 is 
a leading entry so the first rule is not true. Second, the leading entry in the second row is to the left of 
the leading entry in the first row, violating the second rule. 

However, if we start with P, then multiply row two by !/2 and then switch the first two rows, we get 


101/25/2|1 
Q={01 2 4 \|o], 
000 1]1 


which is in echelon form. It is not just a coincidence that we could row reduce P to echelon form; 
in fact, every matrix can be put into echelon form. It is indeed convenient that every matrix is row 
equivalent to a matrix that is in echelon form, and you should be able to see that an augmented matrix 
in echelon form corresponds to a simpler system to solve. 

However, augmented matrices in echelon form are not the simplest to solve, and moreover, echelon 
form is not unique. For example, both the matrices R and S below are also in echelon form and are 
row equivalent to Q. 


2.2 Systems of Equations 23 


i —1 —3/2 1/372 
R=[01 2 6/2], 
00 0 1]1 


10 1/2 0|—3/2 
s=|01 2 0| -4 
000 1] 1 


Since the matrices P, Q, R, and S are all row equivalent, they correspond to equivalent systems. 
Which would you rather work with? What makes your choice nicer to solve than the others? Likely, 
what you are discovering is that matrices corresponding to systems that are quite easy to solve are 
matrices in reduced echelon form. 


Definition 2.2.15 


A matrix is said to be in reduced echelon form if the matrix is in echelon form and all entries above 
each leading entry are zeros. 


The matrix S above is in reduced echelon form, but the matrices Q and R, although in echelon form, 
do not have only zeros above and below each leading entry. Therefore, Q and R are not in reduced 
echelon form. The following is an algorithm (list of steps) that will row reduce a matrix to reduced 
echelon form. 


> Row Reduction Steps to perform Row Reduction on an m x n matrix. 
Start with r = | andc = 1. 


1. Find a leading entry in column c. 
If possible Go to step 2. 
If not possible Increase c by 1. 
If c <n Go to step 1. 
If c > n Algorithm finished: matrix is in reduced echelon form. 
2. Arrange the rows in an order so that there is a leading entry in row r and column c. 
3. Use row operations (1) or (2) to make the leading entry in row r a leading 1, without changing 
any rows above row r. 
4. Using the leading | and row operation (2) to get a 0 in each of the entries above and below the 
leading 1. 
5. Increase r by | and increase c by 1. 
6. Go to step I. 


There are many other sequences of row operations that could be used to put a matrix into reduced 
echelon form, but they all lead to the same result, as the next theorem shows 


Theorem 2.2.16 
Each matrix is row equivalent to exactly one matrix in reduced echelon form. 
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The proof of Theorem 2.2.16 can easily distract us from our applications. Therefore, we leave the 
proof out of the text, but encourage the reader to explore how such a proof can be written or to read 
one of many versions already available. 


Example 2.2.17 To illustrate matrix reduction, we will reduce the matrix in (2.7) to echelon form. 
We will annotate our steps in a way similar to our annotations in the method of elimination. Here, Rx 
will represent the new row k and r; will represent the old row k. 


40 8|0 10 2\0 
04 8|0}R=a/hn 10 1 2/0 
48 ols| @=Y")4 8 ols 
812 8|8 812 8/8 
8 12 16]8 8 12 16|8 
10 2|0 10 2 |0 
— a 01 2 |o 
oe | O 8 -8/8 Si ae 00 —24]8 
4=— Ory +14 4=— l2ro+14 
Rs=—8ritrs | 9 12 —8/8 R5=—12rg+r5 00 —32)8 
012 0 |8 00 —24\8 
102] 0 102] 0 
een oe oe 012] 0 
i, oot =t/a | "S3"" | oorlis 
=- nr 
Rew—rstrg | 00 1)-1/4 000} 1/12 
000) 0 000] 0 


We have reduced the original augmented matrix to echelon form. What additional row operations* 


would be used to put the matrix into reduced echelon form? 


The system of equations corresponding to the echelon form found in Example 2.2.17 is 


a +2y= O 


B+2y= 0 
y =-1/3 
0 = 1/12 
0= 0. 


Clearly, the fourth equation is false. This means that we cannot find any a, (3, andy so that Equation 2.5 
is true. In this case, we would say the system in (2.6) is inconsistent, or has no solution. 
Let us consider one more example similar to those above. 


Example 2.2.18 Let us, now, ask whether Image A can be written using arithmetic combinations of 
any of the other 5 images. That is, can we find a, G, y, 7, and 6 so that 


4 Reduced echelon form is particularly useful if you want to find a solution to the system. In this case, since the system 
has no solution, we did not present these additional row operations. 
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| 
ZZ | 
Image B Image C Image D 


Image A Image E Image F (2.8) 


Again, we set up the system of equations corresponding to Equation (2.8). Without repeated equations, 
the corresponding system of linear equations is 


8a + 47 

4y + 87 = 
8a + 86 + 8y + 127+ 86 =8 
8a + 86 + 8y + 127 + 165 = 8. 


(2.9) 


We will use matrix reduction to find a solution (a, (3, y, T, 6) to the system in (2.9). The corresponding 
augmented matrix is 


0 
0 
8 |. (2.10) 
8 
8 


We will reduce this matrix to reduced echelon form. To begin, we will interchange rows so that we 
begin with the current rows 2, 4, and 5 at the top, without changing the order in any other way. In our 
second step, we will multiply every row by 1/4. 


8004 8|0 2001 2I0 
88812 8|8 2993912 
88812 16/8] — |22234]|2 
0840 8|0 02102/0 
0048 ols 00120/2 

200120 200120 

poe_nany (02229121 pany [2111 0]! 

eS ope es i ob i 

RENEE ORO | 10210210 

00120)2 00120)2 

2001 2|0 2001 20 

roe_min [OLE OfL] , fOLLL Of 

—#*? 10000 1 \0| == 10012 0 \2 

Mrs 1012 —-2/2] Ro [0000 1 [0 

0012 0 (2 0012-2\2 
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200 12] 0 2001 0] 0 
tee_man JOLO-10-1] , (910 -10/-1 
ee Oot 3-0 Be Oot oO) 3 
Bs=3—"s 1900 0 1| of = “tS 1000 0 1] 0 
000 0 2] 0 0000 0] 0 
1001/20] 0 
cee (OO LOR 
ee oad 3.6) 2 
000 0 1| 0 
000 0 0] 0 


The corresponding system of equations for the reduced echelon form of the matrix in (2.10) is 


oe ite = 6 
Bo - T =-1 
v? os , (2.11) 
b=. .0 
i=: 0 


From this system, we can deduce that 6 = 0 and a, (3, and y can be obtained for a given r. That is, for 
each real number we choose for 7, we get a solution. Some example solutions are 


(0, =1; 2; 0, 0) (for T= 0) 
(—1/2, 0, 0, 1, 0) (for 7 = 1) 
(-1, 1, —2, 2, 0) (for 7 = 2). 


Since we can choose any real value for 7, we see that there are infinitely many solutions to Equation 2.8. 


In Example 2.2.18, we had five variables in our system of equations. The corresponding augmented 
matrix had a column, to the left of the augment, corresponding to each variable. In the reduction 
process, one column did not have a leading entry. This happened to be the column corresponding to the 
variable T. So, the solutions depend on the choice of 7. In cases like this, we say T is a free variable. 
There are a few items to note about such scenarios. The system of equations in Example 2.2.18 need 
not be written with the variables in the order we wrote them. Indeed, the system is still the same system 
if we rewrite it by moving the a term in all equations to be the fourth term instead of the first 


86 + 4y 
4y + 8r = 


83 + 8y +127 +8a+ 86 =8 
83 + 8y + 127 + 8a + 165 = 8. 


(2.12) 


Matrix reduction (we will not show the steps) leads to the reduced echelon form 


2.2 Systems of Equations 


27 


8400 8/0 100 2 O/-1 
00 4 8 8|0 010-40) 2 
0480 0|/8| ~ 7001 2 0} 0 (2.13) 
88128 8/8 000 0 1} 0 
8 8 12 8 16/8 000 0 0} 0 


We can write system of equations corresponding to the reduced ec 


helon form because we know in 


which order the variables were in the system. Indeed, the corresponding system is 


B +2a =- 
y —-4a =2 
T+2a =0 
6=0 

0=0. 


(2.14) 


In this case, we say that a is a free variable because we can choose any real number for a and obtain 


a solution. 


Not only can we rewrite the system of equations with the variables in a different order, we can 
use different elimination steps (or row operations in the case of matrix reduction). Indeed, the matrix 
corresponding to the system of equations in Example 2.2.18 can be reduced as follows: 


8004 8|o 2001 2\0 2001 2\0 
88812 8|s] PO" Losssos} 0048 -8)8 
88812168} —>  |088s8sis| ~—3'”? 10048 0|8 
0840 8\jo] ®=""]os4oslo}] @- "4 10840 8 |o 
0048 ols 00480/8 0048 018 
2001 2/0 200 1 0| 0 
R=0/4r (901 2 -2)2 | B= T/st 1901 2 oO] 2 
es 0000 slo} @ 2" 1000 0 8] 0 
Ru 1080-8 16|-8| “ao 1080 —80l-8 
0000 8/0 000 0 0] 0 
2 0010| 0 2 0010| 0 
Ro=—2rit+r2 | -40100] 2 —40100|] 2 
R3=(1/8)r3 
ae 00001] of —> 00001] 0 
Bnet 16 §000\—8 | | > 2 eo 
0 0000] 0 0 0000] 0 
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The corresponding system of equations is 


2a +r = 0 
—4a ery = 2 
d= 0 
2a+ 6 =-1 
0= 0 


In this case, we want to choose a to be free because a appears in more than one equation. 


Fact 2.2.19 
The reader should recognize that, in the case of infinitely many solutions, the variables that we 
say are free depends upon how we choose to eliminate in the system or reduce the matrix. 


It should also be noted that one variable in Example 2.2.18 is never free. Do you see which one? 
(See Exercise 28.) 

As mathematicians, we always want to observe patterns in our results, in hopes to make the next 
similar problem more efficient. So, let us return to our first two examples. First wanted to find a 
solution (a, 3, y) to the system in (2.4). We found one solution. Let us look at the process using 
matrix reduction. Consider the augmented matrix corresponding to System 2.4, the augmented matrix 
corresponding to our last step in elimination, and the reduced echelon form of this matrix, 


04 818 0 1 2) 2 100/-1 
00 8/0 00 1} 0 O10; 2 
84 0/0] ~ | O-10;/-2] ~] OO] OF]. (2.15) 
88 818 111) 1 000} 0 
8 8 16)8 00 1} 0 000} 0 


In the reduced echelon form the right column holds the solution because the matrix corresponds to the 
system 
a=-l, @=2, y= 0 (and two equations that say 0 = 0). 


Notice that there were as many leading entries to the left of the augment line as there are variables. 
This is key! Exercise 29 asks how you know this will always lead to a unique solution. 

Second, we sought a solution to Equation 2.5. The reduced form of the corresponding matrix had 
a leading entry to the right of the augment line. This is key to recognizing that there is no solution. 
Exercise 30 asks you to explain how this always tells us there is no solution. 

Finally, in Example 2.2.18, all leading entries were to the left of the augment line, but there were 
fewer leading entries than variables. In this scenario, we see there are infinitely many solutions. 
Exercise 31 asks you to explain this scenario. 

It turns out that these are the only types of solution results for systems of linear equations. We put 
this all together in the next theorem. 
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Theorem 2.2.20 
Consider the system of m equations with n variables 


1X1 + 41,2x3 +... A nXn = Dy 
GING ap CORE) Se 6a Cee — (OD) 


(2.16) 
Am1X1 + Am,2X3 +... Am nXn = bm. 
Let [M |b] be the corresponding augmented coefficient matrix with, 
G11 41,2... Ain by 
a2,1 €2.2... a2n bo 
M= ; : . and b=] , |. (2.17) 
Gm 4m,2 -+- Am,n bm 


Then the following statements hold. 


1. The system of equations in (2.16) has a unique solution if and only if the number of leading 
entries in the reduced echelon forms of both M and [M|b] is n. 

2. The system of equations in (2.16) has no solution if the number of leading entries in the 
reduced echelon form of M is less than the number of leading entries in the reduced echelon 
form of [M |b]. 

3. The system of equations in (2.16) has infinitely many solutions if and only if the number of 
leading entries in the reduced echelon forms of M and [M|b] is equal, but less than n. 


If we consider a system of equations with n equations and n variables then Theorem 2.2.20 tells us 
that if we reduce M to find that there is a leading entry in every column then the system of equations 
corresponding to [M|b] (no matter what b is) has a unique solution. So, if m <n then the system 
cannot have a unique solution. 

The phrase “if and only if” is a way of saying that two statements are equivalent. We discuss 
statements with this phrase more in Appendix C. This language will occur often in this text. 


2.2.3 Elementary Matrix 


In this section, we will briefly connect matrix reduction to a set of matrix products . This connection 
will prove useful later. To begin, let us define an elementary matrix. We begin with the identity matrix. 


Definition 2.2.21 


The n x n identity matrix, /,, is the matrix that satisfies [,M = MI, = M for all n x n matrices 
M. 


5 For the definition and examples of matrix products, see Section 3.1.2. 
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One can show, by inspection, that J, must be the matrix with n 1’s down the diagonal and 0’s 
everywhere else: 


10...0 

01 0 
T=]. 

0 1 


Definition 2.2.22 


Ann x n elementary matrix £ is a matrix that can be obtained by performing one row operation 
on Jy. 


Let us give a couple examples of elementary matrices before we give some results. 


Example 2.2.23 The following are 3 x 3 elementary matrices: 


100 
e E; = | 001 | is obtained by changing the order of rows 2 and 3 of the identity matrix. 
010 
200 
e Ey =] 010} is obtained by multiplying row | of J; by 2. 
001 
100 
e £3 = | 3 10 | is obtained by adding 3 times row | to row 2. 
001 
2 00 
Since M = | —3 10] cannot be obtained by performing a single row operation on J, so is not an 
001 


elementary matrix. 


Let us now see how these are related to matrix reduction. Consider the following example: 


23 

Example 2.2.24 Let M= {12 1 |. Let us see what happens when we multiply by each of the 
34-1 

elementary matrices in Example 2.2.23. 


100\ /23 5 23 5 
E;\M=1|001 12 1)=]34-1 
O010/ \34-1 12 1 


The matrix multiplication results in M altered by changing rows 2 and 3. 


200\ /23 5 46 10 
ExM=1010 12 1 }/= ]12 1 
001 34-1 34-1 


The matrix multiplication results in M altered by multiplying row | by 2. 
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100\ /23 5 23 5 
E3M=1310 12 1 711 16 
001 34-1 3 4 -1 


The matrix multiplication results in M altered by adding 3 times row | to row 2. 
Indeed, multiplying a matrix M by an elementary matrix has the same result as performing the 
corresponding row operation! 


In fact, we could reduce a matrix to echelon or reduced echelon form using multiplication by 
elementary matrices. Indeed, consider the following example. 


124 
Example 2.2.25 Consider the problem of reducing the matrix M = | 241] to reduced echelon 
111 
form. A possible first step is to replace row 2 by multiplying row | by —2 and adding it to row 2. This 
100 
is the same as multiplying M on the left by Ej = | —2 1 0}. We get 
001 
1 Og 124 12 4 
E\M=|-210)4241]=[00-7 
001 111 111 


The next step might be to replace row 3 by multiplying row | by —1 and adding to row 3. This is the 


1 00 
same as multiplying E;M on the left by Ey = | O 10]. We get 
=101 
1 00 12 4 12 4 
Ex(E;}M)=}| 0 10) ;00-7]}=1]0 0 -7 
=101 111 0-1-3 


The next three steps might be to replace rows 2 and 3 by doing three operations. First, swap rows 
2 and 3 and then multiply the new rows 2 and 3 by —1 and —'/;, respectively. This is equivalent to 
left-multiplying by the elementary matrices 


100 i 00 10 0 
E3={1001]|, &s=|0-10], ande;s=101 0 
010 001 00-1/7 


This multiplication results in 


10 O 100 100 12 4 124 
E5E,E3(FxE;M)=|01 0 0-10 001 00 -7)/={]013 
00-1/7 001 010 0-1-3 001 
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Next, we replace row | by multiplying row 2 by —2 and adding it to row 1| or equivalently, we left- 


1=2.0 
multiply by Fg = | 0 1 OJ. This results in 
001 
1-20 124 1010 
Eo(EsE4E3E,.E;M)=}0 1 OF JOL3)= 701 3 
00 1 001 00 1 


Finally, the last two steps are completed by left-multiplying by the elementary matrices 


Lt=1 10 0 
E;=|01 0 | andzg=|01 -3 
00 1 00 1 


which will replace row | by multiplying row 3 by —10 and adding to row | and then replacing row 2 
by multiplying row 3 by —3 and adding to row 2. This results in 


10 0 10-10 1010 100 
Eg E7(E6E5E4E3E2E,;M) = |0 1-3 01 0 013 },={010] =I. 
00 1 00 1 00 1 001 


So, we see that we can reduce M, in this case to J, by multiplying by elementary matrices in this way: 


I= EgE7E¢EsE,E3E2E\|M. 


The same process works for matrices that that do not reduce to the identity. Let us see this in a quick 
example. 


123 
Example 2.2.26 Let M = | 1 23]. We can reduce M to reduced echelon form by replacing rows 
246 
2 and 3 by two row operations. First, we replace row 2 by multiplying row 1 by —1 and adding to 
row 2. Second, we replace row 3 by multiplying row 1 by —2 and adding to row 3. These row 
operations are equivalent to multiplying by the elementary matrices 


1 00 1 00 
E,;= {-110]7 andEx,=]{ 0 10 
001 —201 
Our final result is 
100 1 00 123 123 
ExE;|M=1| 0 10 —-110 123]}= {000 
—201 001 246 000 


Even though M does not reduce to the identity, multiplying by elementary matrices still works to 
reduce M to reduced echelon form. 


In this section, we built tools that helped us systematically answer the question of Section 2.1 about 
Images 3 and 4. More importantly, we have tools to answer many similar questions. To this point, we 
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have used familiar operations (addition and multiplication) on images. In Section 2.3, we will explore 
the properties of addition and multiplication on the real numbers and how they are related to properties 
for the same operations, but on images. First, we will review the proper use of terminology presented 
in the subsection. 


xx Watch Your Language! When solving systems of equations or reducing matrices, it is important to use 
the proper language surrounding each. 


¥ Wesolvea system of equations. 

v¥ We reduce a matrix. 

v The system of equations has a free variable. 

Vv The solution to the system of equations is (aj, a2, ..., dn). 


We do not say 


X We solve a matrix. 
X The matrix has a free variable. 
X The solutions to the system of equations are aj, a2, ..., Qn. 


2.2.4 The Geometry of Systems of Equations 


It turns out that there is an intimate connection between solutions to systems of equations in two 

variables and the geometry of lines in R*. We recall the graphical method to solving systems below. 

Although you will likely have already done this in previous classes, we include it here so that you can put 

this knowledge into the context of solution sets to systems of equations as classified in Theorem 2.2.20. 
We begin with the following simple example: 


—3 
know if we can express u using arithmetic operations on v and w. In other words, we want to know if 


there are scalars x, y so that 
2\_ . 1 4 2 
=) aU a a 


We can rewrite the right-hand side of the vector equation so that we have the equation with two vectors 


(23)= (33). 


The equivalent system of linear equations with 2 equations and 2 variables is 


Example 2.2.27 Let us consider u = ( : ) ,v= (;) , and w= (3) € R*. Suppose we want to 


x42y22 (2.18) 
x+3y = -3. (2.19) 


Equations (2.18) and (2.19) are equations of lines in IR?, that is, the set of pairs (x, y) that satisfy 
each equation is the set of points on each respective line. Hence, finding x and y that satisfy both 
equations amounts to finding all points (x, y) that are on both lines. If we graph these two lines, we 
can see that they appear to cross at one point, (12, —5), and nowhere else, so we estimate x = 12 and 
y = —5 is the only solution of the two equations. (See Figure 2.9.) You can also algebraically verify 
that (12, 5) is a solution to the system. 
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Fig. 2.9 The equations of Example 2.2.27. 


The graphical method for solving systems of equations can be very inaccurate when the solution is 
a point that does not line up very well with integer coordinates. In general, we prefer using algebraic 
techniques to solve systems. 

However, the geometry does give us some intuition about what is going on. Recall that Theorem 2.2.20 
classified systems of equations by their solution sets; a system has either a unique solution, no solution, 
or infinitely many solutions. Example 2.2.27 showed the graphical interpretation of a case where there 
is a unique solution to the system. 

Exercise 35 asks you to explore the other two cases: 


e What would a system of two equations in two variables and no solution look like, graphically? 

e What would a system of two equations in two variables and infinitely many solutions look like, 
graphically? (Fun fact: this gives you some intuition about why it is impossible to have a system 
of equations with exactly two solutions. See Exercise 35.) 


For systems with more variables, Theorem 2.2.20 guarantees that we have the same three possible 
types of solution sets: The empty set, the set containing only one point, and a set containing infinitely 
many points. To see this geometrically, you need to know that solution sets to linear equations with 
n variables correspond to (n — 1)-dimensional hyperplanes in R”. These hyperplanes can intersect in 
either a single point, infinitely many points, or not at all. 

For ease of visualization, we illustrate the scenario involving a system of three equations in three 
variables, in which case each equation corresponds to a two-dimensional plane in R>. Figure 2.10 shows 
some of the possible ways three planes will cross. There are other ways three planes can intersect. In 
Exercise 33, you will be asked to describe the other possibilities. 


Fig.2.10 Geometric visualization of possible solution sets (in black) for systems of 3 equations and 3 variables. What 
are other possible configurations of planes so that there is no solution? (See Exercise 33.) 
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2.2.5 Exercises 


Using the method of elimination, solve each system of equations: 


x+t+y=2 
1. 

x-y=1 
> 2x + 3y = —5 
" 2x —2y = 10 


3. Which of the following points are solutions to the given system of equations? 


xy + 2x. -—1x34+ x4 =4 
Xx) + 2x2 — »4=2 
—x, — 2x0. — 434+ 3x4 = 0 


(a) (2, 0, —2, 0) 
(b) (0, 1, —2, 1) 
(c) (1, 1,0, 1) 


(d) (2+a,0, 2a —2,a), whereae R 


Solve each of the systems of equations below. If the system has infinitely many solutions, choose 
your free variables, write out 3, and determine a pattern to the solutions. 


4. 
x-y- z= 4 
2x—yt+3z= 2 
x+y-—2z=-1 

5: 
2x -—y-—3z= 1 
3x ++y—-—3z= 4 
—2x +y+2z=-1 

6. 
x —2y—3z= 2 
4x + y—2z= 8 
5x — y—5z=10 

Me 
x —2y —3z7=2 
4x+ y—2z=8 
5x — y—5z=3 

8. 


x—2y—3z= 2 
2x —4y-—6z= 4 
—x +2y4+3z=-2 


Use an augmented matrix to solve each of the systems of equations below by reducing the matrix to 
reduced echelon form. If the system has infinitely many solutions, choose your free variables, write 
out 3, and determine a pattern to the solutions. 
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9. 
x-y- z= 4 
2x-—yt+3z= 2 
x+y-—2z=-1 

10. 
2x —y-—3z= 1 
3x +y—-—3z= 4 
—2x +y+2z=-1 

11. 
x —2y—3z= 2 
4x+ y—2z= 8 
5x — y—5z=10 

12. 


x —2y — 37 =2 
4x+ y—2z=8 
5x — y—5z=3 


13. One needs to be very cautious when combining row operations into a single “step.” Start with 
the matrix A given below: 


1123 
A=]58130 
3211 


(a) First perform all of the following row operations on the matrix A in a single step: 
e Rk} =2ri-—1r 
e Ry =12 + 2r3 
e R3=734+ (1/2)ro. 
What matrix results? 


(b) Next, sequentially perform the row operations on A, showing the intermediate matrix after 
each step. What matrix results? 


(c) Explain what happened. Why does this mean you need to be cautious about combining row 
operations? 


241 


For Exercises 14—20, determine the elementary matrices needed to transform M = | 3 2 2} to M’. 
121 
Use matrix multiplication to verify your answer. 
2 4 1 
14. M’= | -6 —4 -4 
1 2 1 
121 
15. M’=|322 
241 
00-1 


16. M’=|32 2 
12 1 
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17. 


18. 


19. 


20. 


ee | 
M=|6 4 4 
PW 
ile | 
M’=|322 
241 
fo 4 
M=[0 0 -1 
0-4-1 
100 
M’={|010 
001 


Additional Questions. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


Heather has one week left on her summer job doing landscaping for the city. She can take on 
three types of jobs which require 5, 6, and 7 hours, respectively. She has agreed to complete 
seven jobs in the coming week. How many of each type of job should she complete in order to 
work a full 40-hour week? 

The water flow rate w at a river gauge at time ¢ is given in the following data table: 


time (hour) | 0 1 2 
flow (ft? /sec)|20 20 40 


Find a quadratic function w = at* + bt + c that fits the data. Discuss possible benefits of having 
this function. [Hint: Use the data to find a system of linear equations in the unknowns a, b, 
and c.] 

The water flow rate w at a river gauge at time ¢ is given in the following data table: 


time (hour) | 012 3 
flow (ft? /sec)|20 20 40 60 


Show that no quadratic function w = at? + bt + c fits the data exactly. 

Wendy makes three types of wooden puzzles in her shop. The first type of puzzle takes | hour to 
make and sells for $10. The second type of puzzle takes 3 hours to make and sells for $20. The 
third type of puzzles takes 3 hours to make and sells for $30. What is the maximum revenue that 
Wendy can realize if she works for 100 hours making puzzles? 

In Example 2.2.18, one of the variables can never be considered free, no matter the elimination 
process. Which variable is it? How do you know? 

If a system of equations has corresponding augmented matrix whose reduced form has the same 
number of leading entries, on the left of the augment line, as variables, how do we know there is 
a unique solution? 

Why do we know that there is no solution to a system of equations whose corresponding 
augmented matrix has reduced form with a leading entry to the right of the augment line? 

If a system of equations has corresponding matrix whose reduced form has only leading entries 
to the left of the augment line, but there are fewer leading entries than variables, how do we know 
there are infinitely many solutions? 
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29. Prove that every matrix can be reduced to echelon form. 

30. Prove that if a matrix P is row equivalent to a matrix Q (1.e., P ~ Q), then Q is row equivalent 
to P (1e., O ~ P). 

31. Prove Lemma 2.2.12. 

32. Find examples of systems of two equations and two variables that exemplify the other possibilities 
of Theorem 2.2.20. 


(a) What is the graphical interpretation of a system of two equations in two variables that has no 
solution? Give an example of a system of two equations in two variables that has no solution. 
Show both graphical and algebraic solutions for your system. 

(b) What is the graphical interpretation of a system of two equations in two variables that has 
infinitely many solutions? Give an example of a system of two equations in two variables that 
has infinitely many solutions. Show both graphical and algebraic solutions for your system. 


33. Figure 2.10 depicts some of the possible solution sets to a system of 3 equations with 3 variables. 
Sketch or describe the other possibilities. 

34. If (x1, x2,...,Xn) and (1, y2,..., Yn) are two solutions to a system of linear equations (with m 
equations and n variables), what can you say about the n-tuple of averages: ((x1 + y1)/2, (x2 + 
y2)/2,..., (Xn + yn)/2)? Justify your assertion. 

35. Prove that if a system of equations has more than one solution, then it has infinitely many solutions. 
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In Section 2.1, we saw that the set of images possessed a number of convenient algebraic properties. It 
turns out that any set that possesses similar convenient properties can be analyzed in a similar way. In 
linear algebra, we study such sets and develop tools to analyze them. We call these sets vector spaces. 


2.3.1. Images and Image Arithmetic 


In Section 2.1 we saw that if you add two images, you get a new image, and that if you multiply an 
image by a scalar, you get a new image. We represented a rectangular pixelated image as an array of 
values, or equivalently, as a rectangular array of grayscale patches. This is a very natural idea in the 
context of digital photography. 

Recall the definition of an image given in Section 2.1. We repeat it here, and follow the definition 
by some examples of images with different geometric arrangements. 


Definition 2.3.1 


An image is a finite ordered list of real values with an associated geometric arrangement. 


Four examples of arrays along with an index system specifying the order of patches can be seen 
in Figure 2.11. As an image, each patch would also have a numerical value indicating the brightness 
of the patch (not shown in the figure). The first is a regular pixel array commonly used for digital 
photography. The second is a hexagonal pattern which also nicely tiles a plane. The third is a map 
of the African continent and Madagascar subdivided by country. The fourth is a square pixel set with 
enhanced resolution toward the center of the field of interest. It should be clear from the definition that 
images are not matrices. Only the first example might be confused with a matrix. 
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10 


| 


Fig. 2.11 Examples of image arrays. Numbers indicate example pixel ordering. 


We first fix a particular geometric arrangement of pixels (and let n denote the number of pixels 
in the arrangement). Then an image is precisely described by its (ordered) intensity values. With 
this determined, we formalize the notions of scalar multiplication and addition on images that were 
developed in the previous section. 


Definition 2.3.2 

Given two images x and y with (ordered) intensity values (x1, x2,--- , Xn) and (1, y2,--: , Yn), 
respectively, and the same geometry, the image sum written z = x + y is the image with intensity 
values z; = x; + y; for alli € {1,2,--- ,}, and the same geometry. 


Hence, the sum of two images is the image that results from pixel-wise addition of intensity values. 
Put another way, the sum of two images is the image that results from adding corresponding values of 
their ordered lists, while maintaining the geometric arrangement of pixels. 
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Definition 2.3.3 

Given scalar a and image x with (ordered) intensity values (x1, x2,--- , Xn), the scalar product, 
written z = ax is the image with intensity values z; = ax; for alli € {1,2,--- , mn}, and the same 
geometry. 


A scalar times an image is the image that results from pixel-wise scalar multiplication. That is, 
a scalar times an image is the image which results from multiplication of each of the values in the 
ordered list by that scalar, while maintaining the geometric arrangement of pixels. 

We found that these basic arithmetic operations on images lead to a key property: any combination 
of arithmetic operations on images results in an image of the same configuration. In other words, adding 
two images always yields an image, and multiplying an image by a scalar always yields an image. We 
formalize this notion with the concept of closure: 


Definition 2.3.4 


Consider a set of objects X with scalars taken from R, and operations of addition (+) and scalar 
multiplication (-) defined on X. We say that X is closed under addition if x + y € X for all 
x,y € X. We say that X is closed under scalar multiplication if a-x € X for each x € X and 
eacha eR. 


Let Zn. denote the set of all m x n rectangular images. We see that the set Z4,.4, from which we 
considered some example images in Section 2.1, is closed under addition and scalar multiplication. 
This arithmetic with images in Z4,.4 also satisfies a number of other natural properties: 


e (Commutativity of image addition.) If 74 and Jp are images in Z4y4, then I4 + Jp = Ip + Ty. 


For example, 


Image A Image B Image B Image A 


e (Associativity of image addition.) If 14, Ip, and Ic are images in Z4,.4, then (74 + Ip) + Ic = 
I4 + ([p + Ic). For example, 


Image A Image B Image C Image A Image B Image C 


e (Associativity of scalar multiplication.) If a, @ € R, and J € Z4,4, then a(Gl) = (a@)I, e.g., 


Image A Image A 
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e (Distributivity of Scalar Multiplication over Image Addition.) If a € R and J4, Ip € Z4x4, then 


ad, + Ip) = al, + alg, &.g., 
7 7 . ae a . 


Image A Image B 


Image A 


e (Additive identity image.) There is a zero image in Z4,.4— the image that has every pixel intensity 
equal to zero. The sum of the zero image and any other image / is just J. 

e (Additive inverses.) For every image I € Z4,.4, there is an image J so that the sum J + J is just 
the zero image. (Recall that the set of images include those that can be captured by your camera, 
but there are many more, some with negative pixel intensities as well.) 

e (Multiplicative identity.) For any image J € 74,4 the scalar product 1-7 = /. 


The fact that the space Z4,.4 of 4 x 4 has these properties will enable us to develop tools for working 
with images. In fact, we will be able to develop tools for any set (and field of scalars) that satisfies 
these properties. We will call such sets vector spaces. 


2.3.2 Vectors and Vector Spaces 


In the last section, we saw that the set of 4 x 4 images, together with real scalars, satisfies several 
natural properties. There are in fact many other sets of objects that also have these properties. 

One class of objects with these properties are the vectors that you may have seen in a course in 
multivariable calculus or physics. In those courses, vectors are objects with a fixed number, say n, 
of values put together into an ordered tuple. That is, the word vector may bring to mind something 


that looks like (a, b), (a, b, c), or (a1, a2, ..., Gn). Maybe you have even seen notation like any of the 
following: 
a\ a 
a a an an 
(a,b), (a,b,c), (a1,a2,...,q@), b], b|, JT, 
c c : 
an an 


called vectors as well. 

In this section, we generalize the notion of a vector. In particular, we will understand that images 
and other classes of objects can be vectors in an appropriate context. When we consider objects like 
brain images, radiographs, or heat state signatures, it is often useful to understand them as collections 
having certain natural mathematical properties. Indeed, we will develop mathematical tools that can 
be used on all such sets, and these tools will be instrumental in accomplishing our application tasks. 

We haven’t yet made the definition of a vector space (or even a vector) rigorous. We still have some 
more setup to do. In this text, we will primarily use two scalar fields®: R and Z. The field Z> is the 
two element (or binary) set {0, 1} with addition and multiplication defined modulo 2. That is, addition 
defined modulo 2, means that: 


6 The definition of a field can be found in Appendix D. The important thing to remember about fields (for the material 
in this book) is that there are two operations (called addition and multiplication) that satisfy properties we usually see 
with real numbers. 
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0+0=0, 0+1=1+0=1, and1l+1=0. 
And, multiplication defined modulo 2 means 
0-0=0, 0-1=1-0=0, and!l-1=1. 


We can think of the two elements as “on” and “off” and the operations as binary operations. If we 
add 1, we flip the switch and if we add 0, we do nothing. We know that Z2 is closed under scalar 
multiplication and vector addition. 


Definition 2.3.5 


Consider a set V over a field F with given definitions for addition (+) and scalar multiplication 
(.). V with + and - is called a vector space over F if for all u, v, w € V and for all a, G € F, the 
following ten properties hold: 


(P1) Closure Property for Addition u + v € V. 

(P2) Closure Property for Scalar Multiplication a-v € V. 

(P3) Commutative Property for Addition vu + v =v+u. 

(P4) Associative Property for Addition (u + v) + w=u+(v+ vw). 

(P5) Associative Property for Scalar Multiplication 
a+ (B-v) = (aB)-v. 

(P6) Distributive Property of Scalar Multiplication Over Vector Addition a - (u + vu) =aq-u+ 
Q-v. 

(P7) Distributive Property of Scalar Multiplication Over Scalar Addition (a + 3). v=a-u+ 
B-v. 

(P8) Additive Identity Property V contains the additive identity, denoted 0 so that 0+ v= 
v+0=v foreveryue V. 

(P9) Additive Inverse Property V contains additive inverses z so that for every v € V there is a 
z € V satisfying v-+ z= 0. 

(P10) Multiplicative Identity Property for Scalars The scalar set F has an identity element, denoted 

1, for scalar multiplication that has the property 1 - v = v for every v € V. 


Notation. We will use the notation (V, +, -) to indicate the vector space corresponding to the set V 
with operations + and -. 

In Definition 2.3.5, we label the properties as (P1)—(P10). It should be noted that though we use this 
labeling in other places in this text, these labels are not standard. This means that you should focus 
more on the property description and name rather than the labeling. 


Definition 2.3.6 


Given a vector space (V, +, -) over F. We say that v € V is a vector. That is, elements of a vector 
space are called vectors. 


In this text, we will use context to indicate vectors with a letter, such as v or x. In some courses and 
textbooks, vectors are denoted with an arrow over the name, v, or with bold type, v. Also, we usually 
write scalar multiplication using juxtaposition, i.e., we write ax rather than a - x to distinguish from 
the dot product that we will encounter in Section 7.1. 


2.3 Vector Spaces 43 


We will discuss many vector spaces for which there are operations and fields that are typically used. For 
example, the set of real numbers R is a set for which we have a common understanding about what it 
means to add and multiply. For these vector spaces, we call the operations the “standard operations” and 
the field is called the “standard field.” So, we might typically say that R, with the standard operations, 
is a vector space over itself. 


xx Watch Your Language! The language used to specify a vector space requires that we state the set V, the 
two operations + and -, and the field F. We make this clear with notation and/or in words. Two ways to 
communicate this are 


Vv (V,4,-) isa vector space over F. 


Vv V with the operations + and - is a vector space over the field F. 
We should not say (unless ambiguity has been removed) 
X V isavector space. 
Definition 2.3.5 is so important and has so many pieces that we will take the time to present many 
examples. As we do so, consider the following. The identity element for scalar multiplication need not 


be the number |. The zero vector need not be (and in general is not) the number 0. The elements of a 
vector space are called vectors but need not look like the vectors presented above. 


Example 2.3.7 The set of 4 x 4 images 74,4 satisfies properties (P1)—-(P10) of a vector space. 


Example 2.3.8 Let us consider the set 


R= a2 a,, a2, a3 ER 


This means that R? is the set of all ordered triples where each entry in the triple is a real number. 
We can show that R? is also a vector space over R with addition and scalar multiplication defined 
component-wise. This means that, fora, b,c,d, f,g,a@€R, 


a d a+d a aad 
b]+]f)=|o4+ f] anda} b] =|[ab 
c g c+g c ac 


We show this by verifying that each of the ten properties of a vector space are true for R? with addition 
and scalar multiplication defined this way. 


Proof. Let u,v, w € R? and let a, G € R. Then, there are real numbers a, b, c,d, f, g,h, k, and £ so 


a d h 
u=|b|],v=|f], andw=[k 
c £ 
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We will show that properties (P1)—-(P10) from Definition 2.3.5 hold. We first show the two closure 
properties. 


(P1) Now since R is closed under addition, we know that a+d,b+ f, andc + g are real numbers. 


So, 
a d a+d 
utv=|(b]+|f}=[o4+ sf] €R’. 
c g c+g 


Thus, R? is closed under addition. 


(P2) Since R is closed under scalar multiplication, we know that aa, ab, and ac are real numbers. So 


Thus, R? is closed under scalar multiplication. 
(P3) Now, we show that the commutative property of addition holds in R?. Using the fact that addition 
on R is commutative, we see that 


a d at+d dt+a 
utv=J]bJ+ {i fJ=]o4+f]= 1/40) =vt+u. 
c g c+g gte 


Therefore, addition on R? is commutative. 
(P4) Using the associative property of addition on R, we show that addition on R? is also associative. 
Indeed, we have 


(a+d)+h a+(d+h) 
ut+tvytw=[Oo+fytk] =(b4+F +h] =u+ (4+ v). 
(c+g)+2 c+(g4+2) 


So, addition on R? is associative. 
(P5) Since scalar multiplication on R is associative, we see that 


a: (G-a) (aB)-a 
a: (B-v)=]a-(8-b)] =] @B)-b] =(a-8)-v. 
a: (8-c) (aB)-c 


Thus, scalar multiplication on R? is associative. 
(P6) Since property (P6) holds for IR, we have that 


a-(a+d) a-a+a-d 
a-ut+v)=fJa-b+f)}/=]a-b+a-f]=a-uta-v. 
a-(c+g) a-c+a-g 


Thus, scalar multiplication distributes over vector addition for R?. 
(P7) Next, using the fact that (P7) is true on R, we get 
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(a+ fP)-a a-a+ 6-a 
(a+ B)-v=|(a+8)-b| =[a-b+6-b] =a-v+8-v. 
(a+ p)<¢ a-c+B-c 


This means that, scalar multiplication distributes over scalar addition for R? as well. 
(P8) Using the additive identity z € IR, we form the vector 


0 


z=|0] eR. 
0 


This element is the additive identity in R>. Indeed, 


O+a a 
ztvu={]0+b)] = 
O+c Cc 


Therefore property (P8) holds for R?. 
(P9) We know that —a, —b, and —c are the additive inverses in R of a, b, and c, respectively. Using 
these, we can form the vector 


—a4 
w=|-b)eR*. 
=¢ 


We see that w is the additive inverse of v as 
a+ (—a) 0 
v+tw= {b+ (—b)] =]0] =0. 
c+(-—c) 0 


Thus, property (P9) is true for R>. 
(P10) Finally, we use the multiplicative identity, 1 € R. Indeed, 


l-a 
lw)y={]1-b] =v. 
l-d 


Now, because all ten properties from Definition 2.3.5 hold true for R*, we know that R? with component- 
wise addition and scalar multiplication is a vector space over R. 


In the above proof, many of the properties easily followed from the properties on R and did not 
depend on the requirement that vectors in R* were made of ordered triples. In most cases a person 
would not go through the excruciating detail that we did in this proof. Because the operations are 
component-wise defined and the components are elements of a vector space, we can shorten this proof 
to the following proof. 


Alternate Proof (for Example 2.3.8). Suppose R? is defined as above with addition and scalar 
multiplication defined component-wise. By definition of the operations on R°, the closure properties 
(P1) and (P2) hold true. Notice that all operations occur in the components of each vector and are the 
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standard operations on R. Therefore, since R with the standard operations is a vector space over itself, 
properties (P3)-(P10) are all inherited from R. Thus IR? with component-wise addition and scalar 
multiplication is a vector space over R. 

In this proof, we said, “...properties (P3)-(P10) are all inherited from R” to indicate that the 
justification for each vector space property for R? is just repeated use (in each component) of the 
corresponding property for R. 


Example 2.3.9 Because neither proof relied on the requirement that elements of R° are ordered triples, 
we see that a very similar proof would show that for any n € N, 


IR” is a vector space over the scalar field R. 


Here R” is the set of ordered n-tuples. 


Caution: There are instances where a vector space has components that are elements of a vector space, 
but not all elements of this vector space are allowed as a component and the alternate proof does not 
work. So, for example, if we had an image set where we only allowed numbers between 0 and | in 
the pixels, we would not have a vector space, because [0, 1] is not a field (we need the rest of the real 
numbers in order for the set to be closed under addition). 


Example 2.3.10 Let T = {a € R | a £ 0}. Now, consider the set 


P= a2 a,,a2,a3 € T 


with addition and scalar multiplication defined component-wise. Notice that all components of T? are 
real numbers because all elements of T are real numbers. But T does not include the real number 0 
and this means T° is not a vector space over the field R. Which property fails? Exercise 4 asks you to 
answer this question. 


Example 2.3.10 does not simplify the story either. Can you think of a vector space over R, made of 
ordered n-tuples, with addition and scalar multiplication defined component-wise, whose components 
are in R, and there are restrictions on how the components can be chosen? Exercise 8 asks you to 
explore this and determine whether or not it is possible. 


The operations are themselves very important in the definition of a vector space. If we define 
different operations on a set, the structure of the vector space, including identity and inverse elements, 
can be different. 


Example 2.3.11 Let us consider again the set of real numbers, R, but with different operations. Define 
the operation (@) to be multiplication (u @ v = uv) and © defined to be exponentiation (a © u = u®). 
Notice that @ is commutative but © is not. We show that (R, ®, ©) is not a vector space over R. 


(P1) We know that when we multiply two real numbers, we get a real number, thus R is closed under 
® and this property holds. 

(P2) However, R is not closed under ©. For example, (1/2) © (-—1) = (-1)!/2 = /—1 is not a real 
number. 


Since property (P2) does not hold, we do not need to continue checking the remaining eight properties. 


2.3 Vector Spaces 47 


To emphasize how the definition of the operations can change a vector space, we offer more 
examples. 
Example 2.3.12 Let V = R*. Let @ and © be defined as binary operations on V in the following 


way. If u = (5) Ph 6 € V and a € R (the scalar set), then 


a+c-—l aa+l-a 
vev=(7to71) and aou=(%Ti78). 


Then, (v, ®, ©) is a vector space over R. 


Proof. Let u = (5) ,v= (‘) , w= (‘) € V and a, 2 € R and define @ and © as above. We 


will show how three of the 10 properties given in Definition 2.3.5 are affected by the definition of the 
operations. It is left to the reader to show the other 7 properties are true (Exercise 14). Here, we will 
only prove the existence of inverses and identities. 


(P8) We will find the element z € V so that u © z = u forevery u € V. Suppose we let z = (;) eV. 


We will show that z is the additive identity. Indeed, 


ve==()0())=6) 


Thus the additive identity is, in fact, z. Therefore, property (P8) holds true. 


(P9) Now, since u € V, we will find the additive inverse, call itu so thatu @ u = z.Letu = (; 7 a 


We will show that i is the additive inverse of u. Notice that 


~ [a 2—a\_ f[a+2—-a-l1)\ (1 
a (j)e ip) a Goae i) ~ Gi 
Therefore, u @ u = z. Thus, every element of V has an additive inverse. 
(P10) Here, we will show that 1 € R is the multiplicative identity. Indeed, 


_ a\_(1l-a—-1+1)_ (a 
rou=10(S= (15 iti) =(): 
Therefore, property (P10) holds for (V, ®, ©). 


Because all 10 properties from Definition 2.3.5 hold true for (V, ®, ©), we know that (V, @, ©) isa 
vector space over R. 


Notice that the “additive” identity was not a vector of zeros in this case. 
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> LQ 


vy 
Fig.2.12 Visualizing a vector in R>. 


2.3.3 The Geometry of the Vector Space R* 


We can visualize the vector space R?, with the standard operations and field, in 3D space. We typically 
a 
represent a vector in IR? with an arrow. For example, the vector v = | b | can be represented by the 
c 
arrow pointing from the origin (the 0 vector) to the point (a, b, c) as in Figure 2.12. It can also be 
represented as any translation of this arrow, that is, an arrow starting from any other vector in R* and 
pointing toward a point a units further in the “x-direction, b units further in the “y”-direction, and 
c units further in the “z’’-direction from the starting point. The natural question arises: What do the 
vector space properties mean in the geometric context? In this section, we will discuss the geometry 
of some of the vector space properties. The rest, we leave for the exercises. 
The geometry we discuss here translates nicely to the vector space R” (for any n € N) with standard 


operations and field. 


(P1) The Geometry of Closure under Addition This property says that if we add any two vectors in 
IR? together, we never leave R?. 

We briefly recall here the geometric interpretation of addition in R*. Using the definition of addition, 
we have that if 


a d 
v= |b] andu=|f 
c & 
then 
a+d 
v+tu=|f+b 
gre 


That is, v + u is the vector that can be represented by an arrow starting at 0 and pointing toward the 
point (d +a, f +b, g+ c). We can see in Figure 2.13 that the sum is the vector that starts at 0 (the 
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v=ut]d 
e 
d é 
u= (1) - 
Yh 
H > U2 
d 
f 


Ty 


Fig. 2.13 Geometric representation of vector addition in R?. 


start of uw) and points to the end of v. Some describe this as drawing “tip to tail” because the tip of u is 
touching the tail of v. 

We see geometrically that if we translate a vector v along the vector uv, we can form a new vector 
that ends at the point to where the tip of v translated. 

Of course, this new vector is still in R*, as indicated by the closure property. 


(P2) The Geometry of Closure under Scalar Multiplication Let a be a scalar, then 


That is, av can be represented by an arrow starting at 0 and ending at the point (aa, ab, ac). Now, if 
a > 1, this vector points in the same direction as v, but is longer. If 0 < a < 1, then a: v still points 
in the same direction as v, but is now shorter. Finally, if a = 1, it is the multiplicative identity and 
a@-vu =v. These can be seen in Figure 2.14. 

We will see later that, if a < 0 is a negative number, then av will point in the opposite direction of 
v but will lie along the same line, and the length of the arrow will correspond to |a|. 

Any scalar multiple of v is represented by an arrow that points along the line passing through the 
arrow representing v. Property (P2) says that R? contains the entire line that passes through the origin 


and parallel to v. 


(P3) The Geometry of the Commutative Property If we translate v along the vector u (see 
Figure 2.13) or we translate u along the vector v (we encourage the reader to make this drawing), 
the vector formed will point toward the same point in R*. Thus, the commutative property shows us 
that geometrically, it doesn’t matter in which order we traverse the two vectors, we will still end at the 


same terminal point. 
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> U2 


ZY Zy 


Fig. 2.14 Geometric representation of vector multiplication in R? (0 < a < 1 and 6 > 1). 


The remaining seven vector space properties can also be displayed through similar figures. We leave 
these remaining interpretations to Exercise 7. 


2.3.4 Properties of Vector Spaces 


A lot of this material can feel tedious, or abstract, or both, and it can be easy to lose sight of why 
we care about such a long list of properties. Overall, remember that we have defined vector spaces as 
abstract objects that have certain properties. We can prove theorems about vector spaces, and know 
that those theorems remain valid in the case of any specific vector space. In much of this book we build 
machinery that can be used in any vector space. This is what makes linear algebra so powerful for so 
many applications. It is also often useful to recognize when we have a set for which a vector space 
property does not hold, then we know we cannot use our tools on this set. 

We can now deduce some fun facts that hold for all vector spaces. As you read through these 
properties, think about what they mean in the context of, say, the vector space of images, or any of the 
vector spaces from the previous section. 


Theorem 2.3.13 
If x, y, and z are vectors in a vector space (V,+,-) andx+z=y+z, thenx = y. 


Proof. Let x, y, and z be vectors in a vector space. Assume x + z = y + z. We know that there exists 
an additive inverse of z, call it —z. We will show that the properties given in Definition 2.3.5 imply 
that x = y. Indeed, 
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x=x+0 (P8) 
=x+(z+ (-z)) (P9) 
= (x +z) + (-2) (P4) 
= (y +z) + (-2) (assumption) 
=y+(z+(-z)) (P4) 
=y+0 (P9) 
=y. (P8) 


Each step in the preceding proof is justified by either the use of our initial assumption or a known 
property of a vector space. The theorem also leads us to the following corollary’ . 


Corollary 2.3.14 
The zero vector in a vector space is unique. Also, every vector in a vector space has a unique 
additive inverse. 


Proof. We show that the zero vector is unique and leave the remainder as Exercise 11. We consider 
two arbitrary zero vectors, 0 and 0’, and show that 0 = 0’. We know that 0+ x =x =0'+-x. By 
Theorem 2.3.13, we now conclude that 0 = 0’. Therefore, the zero vector is unique. 


The next two theorems state that whenever we multiply by zero (either the scalar zero or the zero 
vector), the result is always the zero vector. 


Theorem 2.3.15 
Let (V, +, -) be a vector space over R. Then 0 - x for each vector x € V. 


The proof follows from Exercise 12, which shows that the vector 0 - x (here 0 € R and x € V) is 
the additive identity in V. 


Theorem 2.3.16 
Let (V, +, -) be a vector space over R. Then a - 0 = 0 for each scalar a € R. 


TA corollary is a result whose proof follows from a preceding theorem. 
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The proof follows from Exercise 13, which shows that the vector a - 0 (here a € R and 0 € V) is 
the additive identity. 


Theorem 2.3.17 
Let (V, +, -) be a vector space over R. Then (—a) - x = —(a-x) =a: (—x) for eacha € R 
and each x € V. 


“ce 


In this theorem it is important to note that indicates an additive inverse, not to be confused with 
a negative sign. Over the set of real (or complex) numbers, the additive inverse is the negative value, 
while for vector spaces over other fields (including Z2) this is not necessarily the case. The theorem 
states the equivalence of three vectors: 


1. A vector x multiplied by the additive inverse of a scalar a, (—a@) - x. 
2. The additive inverse of the product of a scalar a and a vector x, —(a- x). 
3. The additive inverse of a vector x multiplied by a scalar a, a - (—x). 


While these equivalences may seem “obvious” in the context of real numbers, we must be careful to 
verify these properties using only the established properties of vector spaces. 


Proof. Let (V, +, -) be a vector space over R. Let a € Rand x € V. We begin by showing (—a) - x = 
—(a-x). Notice that 


(—a)-x = (-—a)-x+0 (P8) 
= (-—a)-x+(a-x) + (-(a-x)) (P9) 
= (-a+a)-x+(-(a-x)) (P7) 
=0-x+(-(a-x)) scalar addition 
= 0+ (-(a-x)) (Theorem 2.3.15) 
= —(a- x) (P8). 
Therefore, (—a) -x = —(a- x). Exercise 15 gives us that —(a@- x) = a- (—x). Using transitivity of 


equality, we know that (—a)-x = —(a-x) =a: (—x). 


Of particular interest is the special case of a = —1. We see that (—1)x = —x. That is, the additive 
inverse of any vector x is obtained by multiplying the additive inverse of the multiplicative identity 
scalar by the vector x. 


2.3.5 Exercises 


1. In Example 2.2.27, we considered a system of two equations in two variables, and we visualized 
the solution in R?. To find the solution we sketched two lines and found the intersection. Sketch 
another geometric representation of the solution as the linear combination of the two vectors 
u = (1, 1) and v = (2, 3) that yields (2, —3). Your sketch should include the vectors u and v. 

2. Let @ and © be defined on R so thatifa,b € Rthena @b=a+b+landaQOb=ab—a+l. 
Is (R, ®, ©) a vector space over R? Justify. 


2.3 


12. 


13. 


14. 
15. 
16. 


17. 
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. Let V = R” and z a fixed element of V. For arbitrary elements x and y in V and arbitrary scalar 


a in R, define vector addition (@) and scalar multiplication (©) as follows: 
x®y=x+y-—z, and a©@x =a(x —z) +z. 


Show that (V, @, ©) is a vector space over R. 


. In Example 2.3.10, we stated that T? is not a vector space. List all properties that fail to be true. 


Justify your assertions. 


. Define © and © on R so that ifa,b €e R,a Bb =ab andaOb=a-+bD.Is (R, ®, ©) a vector 


space over R? Justify. 


. Consider the set V = R?. Let (aj, az) and (b1, bz) be in V and a in R. Define vector addition and 


scalar multiplication as follows: 
(a1, 42) + (bi, b2) = (a) + 51,0), and a- (aj, az) = (aa), aaz). 


Show that the set V is not a vector space. 


. Draw similar geometric interpretations for the remaining seven vector space properties not 


discussed in Section 2.3.3. 


. Find a vector space over R, made of ordered n-tuples (you choose n > 1), with addition and scalar 


multiplication defined component-wise, whose components are in R, but there are restrictions on 
how the components can be chosen. 


. Consider the set IR. Define vector addition and scalar multiplication so that vector space property 


(P3) is true, but property (P6) is false. 


. Consider the set R. Define vector addition and scalar multiplication so that vector space property 


(P6) is true, but property (P7) is false. 


. Prove that every vector in a vector space has a unique additive inverse to complete the proof of 


Corollary 2.3.14. 

Given that (V, +, -) is a vector space over F, show that if x € V then 0 - x is the additive identity in 
V. Hint: Strategically choose a vector to add to 0 - x and then use one of the distributive properties. 
Complete the proof of Theorem 2.3.16 by proving the following statement and then employing a 
theorem from Section 2.3. 


Let (V, +, -) be a vector space over R and let 0 denote the additive identity in V, then a - 0 
is the additive identity in V. 


Complete the proof in Example 2.3.12. 

Complete the proof of Theorem 2.3.17 by writing a proof of the needed result. 

Consider the set of grayscale images on the map of Africa in Figure 2.11. Create a plausible scenario 
describing the meaning of pixel intensity. Image addition and scalar multiplication should have a 
reasonable interpretation in your scenario. Describe these interpretations. 

(Image Warping) In Chapter 1, we discussed the application of image warping. In order to set 
up a vector space of images for this task, we will need to add and scale the images as defined in 
Definitions 2.3.2 and 2.3.3. In order to add two images, what must be required of all images? 


2.4 Vector Space Examples 


In the previous section, we defined vector spaces as sets that have some of the properties of the 
real numbers. We looked at two examples in that section: the space of images (given a fixed pixel 
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arrangement) and the Euclidean spaces R”. We now present a collection of other useful vector spaces 
to give you a flavor of how broad the vector space framework is. We will use these vector spaces in 
exercises and examples throughout the book. 


2.4.1 Diffusion Welding and Heat States 


In this section, we begin a deeper look into the mathematics for the diffusion welding application 
discussed in Chapter 1. Recall that diffusion welding can be used to adjoin several smaller rods into a 
single longer rod, leaving the final rod just after welding with varying temperature along the rod but 
with the ends having the same temperature. Recall that we measure the temperature along the rod and 
obtain a heat signature like the one seen in Figure 1.4 of Chapter |. Recall also, that the heat signature 
shows the temperature difference from the temperature at the ends of the rod. Thus, the initial signature 
(along with any subsequent signature) will show values of 0 at the ends. 

The heat signature along the rod can be described by a function f : [0, L] > R, where L is the 
length of the rod and f(0) = f(L) = 0. The quantity f(x) is the temperature difference on the rod 
at a position x in the interval [0, L]. Because we are detecting and storing heat measurements along 
the rod, we are only able to collect finitely many such measurements. Thus, we discretize the heat 
signature f by sampling at only m locations along the bar. If we space the m sampling locations 
equally, then for Ax = a, we can choose the sampling locations to be Ax, 2Ax, ..., mAx. Since 
the heat measurement is zero (and fixed) at the endpoints we do not need to sample there. The set of 
discrete heat measurements at a given time is called a heat state, as opposed to a heat signature, which, 
as discussed earlier, is defined at every point along the rod. We can record the heat state as the vector 


u = [Uo, U1, U2, ..., Um, Um+1] = [0, f(Ax), f(2Ax),..., f(mAx), 0]. 
Here, ifu; = f(x) forsome x € [0, L] thenuj+1 = f(x + Ax) anduj—; = f(x — Ax). Figure 2.15 


shows a (continuous) heat signature as a solid blue curve and the corresponding measured (discretized) 
heat state indicated by the regularly sampled points marked as circles. 


up, = f(kAz) + 


ae i 
z=0 Az 3Ar kAx mAz L 


Fig. 2.15 A 1D heat signature, f, is shown as a curve. The corresponding heat state is the discrete collection of 
m + 2 regularly sampled temperatures, {uo, “1,--- , 4m+2}, shown as dots. Both heat signature and heat state have zero 
temperature at the end points x = 0 and x = L. 
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As the heat diffuses through the rod, the new heat signatures will also be described by functions 
fi : (0, L] — R, where ¢ is the time measured since the welding was completed. The discretized heat 
states corresponding to these signatures will form vectors as well. 

We define scalar multiplication and vector addition of heat states component-wise (in the same way 
we define the operations on vectors in R’”*?). Denote the set of all heat states with m + 2 entries 
(assumed to have zero temperature at the endpoints) by H,, (IR). With the operations of addition and 
scalar multiplication, the set Hj, (IR) is a vector space (see Exercise 3). 


2.4.2 Function Spaces 


We have seen that the set of discretized heat states of the preceding example forms a vector space. 
These discretized heat states can be viewed as real-valued functions on the set of m + 2 points that are 
the sampling locations along the rod. In fact, function spaces such as H,,(R) are very common and 
useful constructs for solving many physical problems. The following are some such function spaces. 


Example 2.4.1 Let F = {f : R > Ry}, the set of all functions whose domain is R and whose range 
consists of only real numbers. 

We define addition and scalar multiplication (on functions) pointwise. That is, given two functions 
f and g and a real scalar a, we define the sum f + g by (f + g)(x) := f(x) + g(x) and the scalar 
product af by (af)(x) := a- (f(x)) . (F, +, -) is a vector space with scalars taken from R. 


Proof. Let f,g,h € F anda, 3 € R. We verify the 10 properties of Definition 2.3.5. 


e Since f :R— R and g: R > R and based on the definition of addition, f + g:R— R. So 
f +9 € F and F is closed over addition. 

e Similarly, F is closed under scalar multiplication. 

e Addition is commutative since 


(f + g)@) = f(x) + a) 
= g(x) + f@) 
=(g + f)@). 


So, f+g=estf. 
e And, addition associative because 


(Cf + g) + h)(x) = (Ff + 8)(*) + h@) 
= (f(@) + g@)) + A) 
= f(x) + (g@) + A(x) 
= f@) +(e +h)@) 
=(f + (g +h))@). 


So(ft+g)th=ft+(gth). 
e We see, also, that scalar multiplication is associative. Indeed, 


(a-(B- f))(x) = (a: (G- f@))) = (a8) f(*) = (af) - f)@). 
Soa-(G- f) = (af): f. 
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e Using the distributive law for real numbers, we see that scalar multiplication distributes over vector 
addition here as well since 


(a: (f +8))@) =a-(f+8)(@) 
=a- (f(x) + g(x)) 
=a- f(x) +a- g(x) 
=(a-fta-g)(x). 


Soa(f + g)=af tag. 
Using the definition of scalar multiplication and the distributive law for real numbers, we see that 
scalar multiplication distributes over scalar addition by the following computation: 


((a+ f)- f)\@) = (a+ f)- f@) 
a- fat B- f(x) 
=(a-f + f- f)@). 


So, (a+ 6): f=a-f+ff-f. 


e Now, consider the constant function z(x) = O for every x € R. We see that 


(c+ fa) = 20) + f@) 
=0+ f(x) =f). 


That is, z+ f = f. Thus, the function z is the zero vector in F. 

Now, let g be the function defined by g(x) = — f(x) for every x € R. We will show that g is 
the additive inverse of f. Observe, (f + g)(x) = f(®) + g(x) = f®%) + CG f@)) =0= za), 
where z is defined above. So, f + g = z. That is, g is the additive inverse of f. As f is an arbitrary 
function in F, we know that all functions in F have additive inverses. 

e Since, (1: f)(x) = f(x), 1- f = f. And, we see that the scalar identity is the real number 1. 


Therefore, since all properties from Definition 2.3.5 are true, we know that (F, +, -) is a vector space 
over R. 


In the vector space F, vectors are functions. Example vectors in F include sin(x), x7 +3x—5, 
and e* + 2. Not all functions are vectors in F (see Exercise 8). 


Example 2.4.2 Let P,,(IR) be the set of all polynomials of degree less than or equal to n with 
coefficients from R. That is, 


Pn(R) = {ao + arx + +++ + anx" | ax ER, k=0,1,--- sn}: 


Let f(x) = ap tayx +--+ +ayx" and g(x) = bo + dix +--+ + b,x" be polynomials in P,, (R) and 
a € R. Define addition and scalar multiplication component-wise: 


(f + 8)(x) = (ao + bo) + (ay + By)x + +++ + (Gn + bn) x", 


(a+ f(r) = (ado) + (aa)x + +++ + (aan)x". 


With these definitions, P, (R) is a vector space (see Exercise 11). 
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Example 2.4.3 A function f : R — R is called an even function if f(—x) = f(x) for all x ER. 
The set of even functions is a vector space over R with the definitions of vector addition and scalar 
multiplication given in Example 2.4.1 (see Exercise 9). 


Example 2.4.4 Fix real numbers a andb witha < b.LetC([a, b]) be the set of all continuous functions 
Ff : (a, b] > R. Then, using limits from Calculus, we can show that C([a, b]) is a vector space with 
the operations of function addition and scalar multiplication defined analogously to the corresponding 
operations on F. 


Proof. Suppose a,b € R with a < b. Let f, g € C([a, b]) and let a, @ € R. We know that f, g € 
C([a, b]) implies that, for any c € [a, b], we have 


lim f() = f() and lim g(x) = go). 


We will show that the 10 properties from Definition 2.3.5 hold true. 


e Using properties of limits, we know that 


jim (f + g)@) = lim (f(@) + g(x) 
= lim f(x) + lim g(x) 
= fe) +s) 
=(f + 8)c). 


Therefore f + g is continuous on [a, b] and C({a, b]) is closed under addition. 
e Now, 
lim (@- f)(x) = lim af (x) 
ASCE xe 
=a lim f(x) 
x->C 
= af (c) 
= (a: f)(c). 


Thus, af € C({a, b]). So, C([a, b]) is closed under scalar multiplication. 
e Since f, g € F and no other property from Definition 2.3.5 requires the use of limits, we can say 
that the last 8 properties are inherited from F. 


Therefore, (C([a, b]), +, -) is a vector space over R. 


Example 2.4.5 Consider the set Co([a, b]) C C([a, b]) where Co([a, b]) := {f € C([a, b]): f@ = 
Ff (b) = 0}. Then Co([a, b]) is the set of all continuous functions f : [a,b] — R that are zero at the 
endpoints. The set $ is a vector space with the operations of function addition and scalar multiplication 
inherited from C([a, b]). (See Exercise 23.) 


In fact, the space of heat signatures on a 1D rod of length L is modeled by the vector space Co([a, b]) 
in Example 2.4.5 witha = 0 andb= L. 
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2.4.3 Matrix Spaces 


A matrix is an array of real numbers arranged in a rectangular grid, for example, let 


Hee € 2 5) 
579 
The matrix A has 2 rows (horizontal) and 3 columns (vertical), so we say itisa2 x 3 matrix. In general, 
a matrix B with m rows and n columns is called an m x n matrix. We say the dimensions of the matrix 
are m and n. 
Any two matrices with the same dimensions are added together by adding their entries entry-wise. 


A matrix is multiplied by a scalar by multiplying all of its entries by that scalar (that is, multiplication 
of a matrix by a scalar is also an entry-wise operation, as in Example 2.3.8). 


123 101 12 
He oe) 215g) OG a) 


224 
Te (; 8 5) ’ 
but since A € M2x3 and C € M2 x2, the definition of matrix addition does not work to compute 


A+C. That is, A + C is undefined. 
Using the definition of scalar multiplication, we get 


ae 3(1) 3(2) 3(3)\ (3 6 9 
~ \3(5) 3(7) 3(9)/ — \15 21 27) ° 


Example 2.4.6 Let 


Then 


With this understanding of operations on matrices, we can now discuss (Minyxn, +, +) aS a vector 
space over R. 


Theorem 2.4.7 

Let m and n be in N, the set of natural numbers. The set Mj.» of all m x n matrices with 
real entries together with the operations of addition and scalar multiplication is a vector space 
over R. 


Proof. Consider Mm xn of real-valued m x n matrices. Because matrix addition and scalar multipli- 
cation are defined entry-wise, neither operation results in a matrix of a different size. Thus, Mm xn is 
closed under addition and scalar multiplication. Now, since the entries of every matrix in Mix are 
all real, and addition and scalar are defined entry-wise, we can say that Mx, inherits the remaining 
vector space properties from R. Thus, (Minxn, +, -) is a vector space over R. 


To make the notation less tedious, it is common to write A = (4j,j)mxn or A = (qj,;) to denote an 
m X n matrix whose entries are represented by a;,;, where i, j indicates the position of the entry. That 
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is, aj, ; is the entry in matrix A that is located in the i row and j"* column. For example, the number 


in the first row and second column of the matrix A above is denoted by a1,2 and aj,2 = 2. With this 
notation, we can formally define matrix addition and scalar multiplication. 


Definition 2.4.8 


Let A = (q;,;) and B = (b;,;) be m x n matrices. We define matrix addition by 


A+ B= (qj,j + bj,;). 


Let a € IR. We define scalar multiplication by ~A = (aqj,;). 


Caution: The spaces Mx» and My xm are not the same. The matrix 


1 —2 3 

0 4 10 

MI= | a. pg 

-1 0 0 

is a4 x 3 matrix, whereas 

1 07-1 

M,=|1-2 47 0 

3 108 0 


is a3 x 4 matrix. As we saw above, these two matrices cannot be added together. 


2.4.4 Solution Spaces 


In this section, we consider solution sets of linear equations. If an equation has n variables, its solution 
set is a subset of IR”. When is this set a vector space? 


Example 2.4.9 Let V C R? be the set of all solutions to the equation x; + 3x2 — x3 = 0. That is, 
x] 


V=4 1x] 6 R?| x1 4+3m—213=0 
x3 


The set V together with the operations + and - inherited from R? forms a vector space. 


Proof. Let V be the set defined above and let 


uy Vv] 
u=[u.),v=f]u]ev 
U3 v3 


Let a € R. Notice that Properties (P3)—(P7) and (P10) of Definition 2.3.5 only depend on the definition 
of addition and scalar multiplication on R? and, therefore, are inherited properties from R?. Hence we 
need only check properties (P1), (P2), (P8), and (P9). Since u, v € V, we know that 
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uy +3u2—u3=0 and vy +3v2-—13=0. 
We will use this result to show the closure properties. 
uy + vi 
First, notice that u + v = | u2 + v2 }. Notice also that 


u3 +3 


(uy + v1) + 3(u2 + v2) — (ug + 03) = (uy + 3u2 — 3) + (v1 + 302 — 03) 
=0+0=0. 


Thus, u + v € V. Since u and v are arbitrary vectors in V it follows that V is closed under addition. 


Next, notice that ~a-u = | auz } and 


au, + 3au2 — au3z = a(u, + 3u2 — 3) 


= a(0) = 0. 
Therefore, au € V. Hence V is closed under scalar multiplication. 
0) 
The zero vector z= | 0] € R? is also the zero vector in V because 0+3-0—0=0 and 
0) 


Ztu=u. 
Finally, notice that 


(—u1) + 3(—u2) — (—u3) = —(1 + 3u2 — 3) = 0. 


—u4 
Therefore, —u = | —u2 | € V. Since —u is the additive inverse of u in IR?, we know that V contains 
—u3 
additive inverses as well. 
Therefore, since all properties in Definition 2.3.5 hold, (V, +, -) is a vector space over R. 


We can also visualize this space as a plane in R* through the origin. In this case the space V is 
a plane in IR? that passes through the points (0, 0, 0), (1, 0, 1), and (0, 1, 3) (three arbitrarily chosen 
solutions of x; + 3x2 — x3 = 0). 

Here is a very similar equation, however, whose solution set does not form a vector space. 


Example 2.4.10 Let V C R? be the set of all solutions to the equation x; + 3x2 — x3 = 5. In this 
case, V is a plane in IR? that passes through the points (0, 0, —5), (1, 0, —4), and (0, 1, —2), in other 
words, it is the translation of the plane in Example 2.4.9 by 5 units down (in the negative x3-direction). 
The set V is not a vector space. The proof is Exercise 13. 


In fact, we can generalize the preceding examples by considering the set of solutions to linear 
equations that are homogeneous or inhomogeneous. 
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Definition 2.4.11 


We say that a linear equation, a)x) + dox2 +---+a)xXn = b, is homogeneous if b = 0 and is 
inhomogeneous if b 4 0. 


Example 2.4.12 Let {a,,a2,...,da,} C R and consider the set V € R” of all solutions to the 
homogeneous linear equation a,x; + a2x2 +--+: + ay,Xy, = 0. The set V together with the operations 
+ and - inherited from R” forms a vector space. The proof is Exercise 14. 


Example 2.4.13 Is the set of all solutions to an inhomogeneous linear equation a vector space? In 
Example 2.4.10, we considered a specific case where the solution set is not a vector space. In Exercise 20, 
the reader is asked to show that the solution set to any inhomogeneous linear equation is not a vector 
space. The key steps are to show that neither closure property holds. 


2.4.5 Other Vector Spaces 


In many areas of mathematics, we learn about concepts that relate to vector spaces, though the details 
of vector space properties may be simply assumed or not well established. In this section, we look at 
some of these concepts and recognize how vector space properties are present. 


Sequence Spaces 
Here, we explore sequences such as those discussed in Calculus. We consider the set of all sequences 
in the context of vector space properties. First, we give a formal definition of sequences. 


Definition 2.4.14 


A sequence of real numbers is a function s : N > R. That is, s(n) = a, forn = 1, 2,--- where 
dn € R. A sequence is denoted {a,}. Let SCR) be the set of all sequences. Let {a,} and {b,} be 
sequences in S(R) and a in R. Define sequence addition and scalar multiplication with a sequence 
by 

{an} + {Bn} = {an + bn} and a- {an} = {adn}. 


In Exercise 15, we show that S(R), with these (element-wise) operations, forms a vector space 
over R. 


Example 2.4.15 (Eventually Zero Sequences) Let Sfn(R) be the set of all sequences that have a finite 
number of nonzero terms. Then Sgn (R) is a vector space with operations as defined in Definition 2.4.14. 
(See Exercise 16.) 


We find vector space properties for sequences to be very useful in the development of calculus 
concepts such as limits. For example, if we want to apply a limit to the sum of sequences, we need to 
know that the sum of two sequences is indeed a sequence. More of these concepts will be discussed 
later, after developing more linear algebra ideas. 
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Fig.2.16 A graphical representation of a vector in Je. 


Bar Graph Spaces 
Now we consider the set of bar graphs, one in which visualization is necessary for understanding the 
elements of the set. 


Definition 2.4.16 


Let k € R and denote by .% the set of all bar graphs with k bins. Here, we consider a bar graph to be 
a function from the set {1, ... , k} to IR, and we visualize such an object in the familiar graphical way, 
as shown in Figure 2.16. Define addition and scalar multiplication on such bar graphs as follows. 
Let J; and Jz be two bar graphs in % and let a be a scalar in R. 


e We define the 0 bar graph to be the bar graph where each of the k bars has height 0. 

e J; + J2 is defined to be the bar graph obtained by summing the height of the bars in 
corresponding bins of J and Jo. 

e a- J; is defined to be the bar graph obtained by multiplying each bar height of J; by a. 


With this definition, we can verify that 7, is a vector space. Indeed, because addition and scalar 
multiplication are bar-wise defined, and because the heights of the bars are real numbers, the properties 
of Definition 2.3.5 are inherited from R. 

The space of bar graphs is actually a discrete function space with the added understanding that the 
domain of the functions has a geometric representation (the “bins” for the bar graphs are all lined up 
in order from 1 to k). 


Digital Characters 
The following is an example of a vector space over the field Z>. 


Example 2.4.17 Consider the image set of 7-bar LCD characters, D(Z2), where Z> is the field that 
includes only the scalar values 0 and 1. Figure 2.17 shows ten example characters along with the image 
geometry. 
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Fig.2.17 The ten digits of a standard 7-bar LCD display. 


For these images, white corresponds to the value zero and green corresponds to the value one. With 
element-wise definitions (given on page 41) of addition and scalar multiplication as defined for the 
field Z., D(Z2) is a vector space. (See Exercise 19.) Here are two examples of vector addition in 


D(Z): 


What happens with scalar multiplication? What are the digital characters that can be obtained just 
from scalar multiplication (without addition) using the characters in Figure 2.17? 


2.4.6 Is My Set a Vector Space? 


Given a set and operations of addition and scalar multiplication, we would like to determine if the set 
is, or is not, a vector space. 
If we want to prove that the set is a vector space, we just show that it satisfies the definition. 


> Aset V (with given operations of vector addition and scalar multiplication) is a vector space if it 
satisfies each of the ten properties of Definition 2.3.5. One must show that these properties hold for 
arbitrary elements of V and arbitrary scalars in the field (usually R). 
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In order to determine if a set S is not a vector space, we need to show that the definition does not 


hold for this set, or we need to show that a property possessed by all vector spaces does not hold for 
the set. 


> 


A set V (with given operations of vector addition and scalar multiplication) is not a vector space 
if any one of the following statements is true. 


a 


. For some element(s) in V and/or scalar(s) in R, any one of the ten properties of Definition 2.3.5 
is not true. 

. For some elements x, y, and zin V withx # y, wehavex+z=y-+z. 

. The zero element of V is not unique. 

. Any element of V has an additive inverse that is not unique. 

. If for some element x in V,0- x 4 0. That is, the zero scalar multiplied by some element of V 
does not equal the zero element of V. 


nA BW bh 


2.4.7. Exercises 


io) 


. In the properties of a vector space, (P5) and (P6) refer to three different operations, what are they? 
. Suppose V = {v € R | v £ 0}. Let + and - be the standard operations in R. Show why, when 


considering whether (V, +, -) is a vector space over R, we can say that V is neither closed under 
scalar multiplication nor is it closed under vector addition. 


. Show that H,,(R), the set of all heat states with m + 2 real entries, is a vector space. 
. Plot a possible heat state u for a rod with m = 12. In the same graph, plot a second heat state that 


corresponds to 2u. Describe the similarities and differences between u and 2u. 


. Plot a possible heat state u for a rod with m = 12. In the same graph, plot the heat state that 


corresponds to u + v, where v is another heat state that is not a scalar multiple of u. Describe the 
similarities and differences between u and u + v. 


. A 12-megapixel phone camera creates images with 12, 192,768 pixels, arranged in 3024 rows and 


4032 columns, such as the one in Figure 3.9. The color in each pixel is determined by the relative 
brightness of red, green, and blue light specified by three scalars. So rather than just a single scalar 
determining grayscale intensity, each pixel is assigned 3 numbers that represent the intensities of 
red, green, and blue light for the pixel. Verify that the space V of such images is a vector space. 


. Let Po = (ax? + bx +c |a,b,c € R}. Show that P2 is a vector space with scalars taken from R 


and addition and scalar multiplication defined in the standard way for polynomials. 


. Determine whether or not the following functions are vectors in F as defined in Example 2.4.1. 


f(x) = tan(x), g(x) = x5, h(x) = In(x). 


. Show that the set of even functions (see Example 2.4.3) is a vector space. 
. Let C!({a, b]) be the set of continuous functions whose first derivative is also continuous on 


the interval [a, b]. Is C!({a, b]) a vector space with the definitions of vector addition and scalar 
multiplication given in Example 2.4.1? 


. Show that the set P,, (IR) of Example 2.4.2 is a vector space. 
. Define the set P(R) to be the union U,, <j Pn(R). Is this set a vector space, with the operations of 


addition of polynomials and multiplication by constants? Justify your reasoning. 


. Let V C R? denote the set of all solutions to the linear equation x; + 3x2 — x3 =5. 


(a) Use the algebraic definition of a vector space to show that V is not a vector space. 
(b) Give a geometric argument for the fact that V is not a vector space. 


2.4 


14. 


15. 
16. 
17. 


18. 


19. 
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Let {a1, a2, ..., 4n} C R and consider the set V € R” of all solutions to the homogeneous linear 
equation a,x; + d2x2 +---+ a,x, = 0. Show that the set V together with the operations + 
and - inherited from IR” forms a vector space. 

Show that the set SCR) of Definition 2.4.14 is a vector space. 

Show that the set Sg, (IR) of Example 2.4.15 is a vector space. 

For each of the following sets (and operations), determine if the set is a vector space. If the set is 
a vector space, verify the axioms in the definition. If the set is not a vector space, find an example 
to show that the set is not a vector space. 


(a) The set of vectors in the upper half plane of R?, that is, the set of vectors in R* whose y- 
component is greater than or equal to zero. (Addition and scalar multiplication are inherited 
from R2.) 

(b) The set of images of the form below, with a, b, and c real numbers. 


T= 


(c) The set of upper triangular 3 x 3 matrices. A matrix A is upper triangular if a;,; = 0 whenever 
i > j. (Addition and scalar multiplication are inherited from the space of 3 x 3 matrices.) 

(d) The set of polynomials of any degree that go through the point (2, 2). (Addition and scalar 
multiplication are inherited from the space of polynomials.) 

(e) The set of solutions to a homogeneous equation with 4 variables, considered as a subset of 
R*. (Addition and scalar multiplication are inherited from R*.) 

(f) The set of solutions to the equation xy — x? = 0, considered as a subset of R?. (Addition and 
scalar multiplication are inherited from R?.) 

(g) The set of real-valued integrable functions on the interval [a,b]. (Addition and scalar 
multiplication are defined pointwise.) 

(h) The set of (continuous) heat signatures on arod of length L with at most 3 peaks (local maxima). 
(Addition and scalar multiplication are inherited from the vector space of heat signatures.) 


We say that W is a subset of V if every element of W is an element of V. In the case where W is 
a subset of V, we write W C V. If W C V and W does not contain all of the elements of V, we 
say W is a proper subset of V. Now, consider a vector space (V, +, -). Which of the 10 vector 
space properties are not necessarily true for a proper subset W C V? 

Prove that D(Z2) from Example 2.4.17 is a vector space. 


20. Show that neither closure property holds for the set 


V{(x1, x2,---,%n) ER | ayxy +agx2 +... + ay)Xn = b, b FO}. 
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21. 


22. 


23. 


24. 
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Let V be the finite vector space given by 


fOr Q-OuO} 


Define , vector addition, and X, scalar multiplication, according to the following tables: 


a\b\c|d 
bleld Billa|b|cld 
aldic Ollajajala 
djalb \}la|b\cl\ld 
c\bla 


One can verify by inspection each of the vector space properties hold for V. 


(a) What element is the additive identity for V? 
(b) What is the additive inverse of a? of c? 


Consider the set R and the operations + and « defined on R by 
Let u, v € R be vectors. Define u }v =u+v—3. 
and 
Let a € R bea scalar and u € R be a vector. Define a * u = au/2. 


Is (R, +, x) a vector space over R? Justify your response with a proof or a counterexample. 

Show that the closure properties of Definition 2.3.5 hold true for the set of functions 
(Co([a, b]), +, -). Use this result to state how you know that (Co([a, b]), +, -) is a vector space 
over R. 

Let V be the set of vectors defined below: 


HOO} 


(a) Define H, vector addition, and &, scalar multiplication, so that (V, H, &) is a vector space 
over Z>. Prove that (V, H, &) is indeed a vector space with the definitions you make. 

(b) Give an example where V might be of interest in the real world. (You can be imaginative 
enough to think of this set as a very simplified version of something bigger.) 

(c) Suppose we add another vector to V to get V.Isit possible to define H and & so that (V,&, &) 
is a vector space over Z2? Justify. 

(d) Is it possible to define H and & so that (V, H, &) is a vector space over R? Justify. 


2.5 Subspaces 


PetPics, a pet photography company specializing in portraits, wants to post photos for clients to review, 
but to protect their artistic work, they only post electronic versions that have copyright text. The text is 
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Fig. 2.18 Example Proof photo from PetPics, with copyright text. 


added to all images, produced by the company, by overwriting, with zeros, in the appropriate pixels, as 
shown in Figure 2.18. Only pictures that have zeros in these pixels are considered legitimate images. 

The company also wants to allow clients to make some adjustments to the pictures: the adjustments 
include brightening/darkening, and adding background or little figures like hearts, flowers, or squirrels. 
It turns out that these operations can all be accomplished by adding other legitimate images and 
multiplying by scalars, as defined in Section 2.3. 

It is certainly true that the set of all legitimate images of the company’s standard (m x n)-pixel size 
is contained in the vector space Z,., of all m x n images, so we could mathematically work in this 
larger space. But, astute employees of the company who enjoy thinking about linear algebra notice 
that actually the set of legitimate images itself satisfies the 10 properties of a vector space. Specifically, 
adding any two images with the copyright text (for example, adding a squirrel to a portrait of a golden 
retriever) produces another image with the same copyright text, and multiplying an image with the 
copyright text by a scalar (say, to brighten it) still results in an image with the copyright text. Hence, 
it suffices to work with the smaller set of legitimate images. 

In fact, very often the sets of objects that we want to focus on are actually only subsets of larger 
vector spaces, and it is useful to know when such a set forms a vector space separately from the larger 
vector space. 

Here are some examples of subsets of vector spaces that we have encountered so far. 


joel 


. Solution sets of homogeneous linear equations, with n variables, are subsets of R”. 

2. Radiographs are images with nonnegative values and represent a subset of the larger vector space 

of images with the given geometry. 

The set of even functions on R is a subset of the vector space of functions on R. 

Polynomials of order 3 form a subset of the vector space Ps(R). 

5. Heat states on a rod in a diffusion welding process (the collection of which is H,,(R)) form a 
subset of all possible heat states because the temperature is fixed at the ends of the rod. 

6. The set of sequences with exactly 10 nonzero terms is a subset of the set of sequences with a finite 

number of terms. 


os 


Even though operations like vector addition and scalar multiplication on the subset are typically 
the same as the operations on the larger parent spaces, we still often wish to work in the smaller more 
relevant subset rather than thinking about the larger ambient space. When does the subset behave like 
a vector space in its own right? In general, when is a subset of a vector space also a vector space? 
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2.5.1 Subsets and Subspaces 


Let (V, +, -) be a vector space. In this section we discuss conditions on a subset of a vector space that 
will guarantee the subset is also a vector space. Recall that a subset of V is a set that contains some of 
the elements of V. We define subset more precisely here. 


Definition 2.5.1 


Let V and W be sets. We say that W is a subset of V if every element of W is an element of V and 
we write W Cc V or W C V.In the case where W  V (there are elements of V that are not in W), 
we say that W is a proper subset of V and we write W C V. 


In a vector space context, we always assume the same operations on W as we have defined on V. 
Let W be a subset of V. We are interested in subsets that also satisfy the vector space properties 
(recall Definition 2.3.5). 


Definition 2.5.2 


Let (V,+,-) be a vector space over a field F. If W C V, then we say that W is a subspace of 
(V,+, -) whenever (W, +, -) is also a vector space. 


Now consider which vector space properties of (V, +, -) must also be true of the subset W. Which 
properties are not necessarily true? The commutative, associative, and distributive properties still hold 
because the operations are the same, the scalars come from the same scalar field, and elements of W 
come from the set V. Therefore, since these properties are true in V, they are true in W. We say that 
these properties are inherited from V since V is like a parent set to W. Also, since, we do not change 
the scalar set when considering a subset, the scalar | is still an element of the scalar set. This tells us 
that we can determine whether a subset of a vector space is, itself, a vector space, by checking those 
properties that depend on how the subset differs from the parent vector space. The properties we need 
to check are the following 


(P1) W is closed under addition. 

(P2) W is closed under scalar multiplication. 
(P8) W contains the additive identity, denoted 0. 
(P9) W contains additive inverses. 


With careful consideration, we see that, because V contains additive inverses, then if (P1), (P2), and 
(P8) are true for W, it follows that W must also contain additive inverses (see Exercise 14). Hence, as 
the following theorem states, we need only test for properties (P1), (P2), and (P8) in order to determine 
whether a subset is a subspace. 


Theorem 2.5.3 
Let (V, +, -) be a vector space over a field F and let W be a subset of V. Then (W, +, -) is a 
subspace of V over F if and only if 0 € W and W is closed under vector addition and scalar 
multiplication. 
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The following corollary is based on the fact that as long as W is nonempty and satisfies (P1) and 
(P2), then W automatically satisfies (P8). 


Corollary 2.5.4 
Let (V,+,-) be a vector space over a field F and let W be a nonempty subset of V. Then 
(W, +, -) is a subspace of V over F if and only if W is closed under vector addition and scalar 
multiplication. 


The proof is the subject of Exercise 15. 

While Corollary 2.5.4 presents a somewhat simplified test for determining if a subset is a subspace, 
the additional condition in Theorem 2.5.3 presents a very simple test for possibly determining if a 
subset is not a subspace. Namely, if 0 ¢ W then W is not a subspace. When discussing subspaces, we 
tend to leave off using the notation (W, +, -) and write W instead because the operations are understood 
in the context of (V, +, -). 

The following two results give convenient ways to check for closure under addition and scalar 
multiplication. 


Theorem 2.5.5 

Let (V, +, -) be a vector space and let X be a nonempty subset of V. Then X is closed under 
both addition and scalar multiplication if and only if for all pairs of vectors x, y € X and for all 
scalars a, 8€ F,ax+ Bye X. 


This theorem makes use of an “if and only if” statement. The proof relies on showing both 


e (=) If X is closed under both addition and scalar multiplication, then for all x, y € X and for all 
a, Be F,ax + By € X and 

e (<) If for all x, y € X and for all a, 6 € R, ax + By € X, then X is closed under both addition 
and scalar multiplication. 


Proof. Let (V, +, -) be a vector space and let X be a nonempty subset of V. 

(=>) Suppose X is closed under addition and scalar multiplication, and that x and y are in V and a 
and @ are in F. Since X is closed under scalar multiplication, we know that ax and (y are in X. But 
then since X is also closed under addition, we conclude that ax + (@y is in X. Since this is true for all 
a, 8 € Fandx, y € X, we conclude that X has the property that for all x, y € X and foralla, G € F, 
ax + By EX. 


(<=) Suppose X has the property that for all x, y € X and for alla, G € F, ax + Gy € X. Now let 
x € X anda € F,and fix @ = 0 € Fand y = 0, the zero vector. Thenax + By = ax +0 = axisinX. 
Hence X is closed under scalar multiplication. Finally, let x, y € X anda = ( = 1, the multiplicative 
identity of F. Then we can also conclude that ax + By = 1x + ly = x + yisin X, and so X is closed 
under addition. 


It is often even simpler to check the conditions for the following similar result. 
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Corollary 2.5.6 

Consider a set of objects X, scalars in R, and operation + (addition) and - (scalar multiplication) 
defined on X. Then X is closed under both addition and scalar multiplication if and only if 
a-x+y eX forallx, y € X andeachaeR. 


The proof is the subject of Exercise 16. 


2.5.2 Examples of Subspaces 


Every vector space (V, +, -) has at least the following two subspaces. 


Theorem 2.5.7 
Let (V, +, -) be a vector space. Then V is itself a subspace of (V, +, -). 


Proof. Since every set is a subset of itself, the result follows from Definition 2.5.2. 


Theorem 2.5.8 
Let (V, +, -) be a vector space. Then the set {0} is a subspace of (V, +, -). 


The proof is Exercise 19. 


Example 2.5.9 Recall Example 2.4.9 from the last section. Let V C R° be the set of all solutions to 
the equation x; + 3x2 — x3 = 0. Then V is a subspace of R?, with the standard operations. 


More generally, as we saw in the last section, the set of solutions to any homogeneous linear equation 
with n variables is a subspace of (R”, +, -). 


Example 2.5.10 Consider the coordinate axes as a subset of the vector space R?. That is, let T C R* 
be defined by 
T = {x = (x1, x2) € R* | x1 = 0 or x2 = 0}. 


T isnotasubspace of (IR7, +, -), because although Oisin 7, making T 4 @, T does not have the property 
that for all x, y € T and for all a, 8 € R, ax + By € T. To verify this, we need only produce one 
example of vectors x, y € T and scalars a, 3 € R so that ax + (@y is not in T. Notice that x = (0, 1), 
y = (1, 0) are elements of T and a = 3 = 1| are in R. Since 1- x +1- y = (1, 1) which is not in T, 
T does not satisfy the subspace property. 


Example 2.5.11 Consider W = {(a, b,c)E Rl c= O}. W is a subspace of R?, with the standard 
operations of addition and scalar multiplication. See Exercise 9. 
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Example 2.5.12 Consider the set of images 


V = {J | J is of the form shown in Figure 2.19 and a, b,c € R}. 


V is a subspace of the space of images with the same geometric configuration. 


Proof. Let V be defined as above. We will show that V satisfies the hypotheses of Theorem 2.5.3. We 
can see that the set of images V is a subset of the vector space of all images with the same geometric 
configuration. 

Next, notice that V is nonempty since the image in V with a = b = c = O is the zero image. Now, 
we need to show that the set is closed under scalar multiplication and addition. Let a € R be a scalar 
and let 7; and J> be images in V. For the image J, let a = a,, b = bj, and c = cj, and for the image 
In, let a = an, b = bo, and c = cp (Figure 2.20). 

We can see that aJ; + J is also in V with a = aaj + a2, b= ab, + bo, and c = ac, + cp. For 
example, the pixel intensity in the pixel on the bottom left of aly + Ip is 


a(a, — cj) + a2 — cp = aa, + a2 — (ac, —c2) =a-—c. 


2a 
a a 
a—b 
l= at+b b 
b 
a-c c 


Fig.2.19 The form of images in the set V of Example 2.5.12. 


2a, 2a 
ay ay ag ag 
a, — by ay — by 
q, = ay 4 by by Ty = ag 4 bp by 
by by 
ay—- Cy Cy ag — CQ C2 
ay ag 


Fig.2.20 Arbitrary images J and /> in the set V of Example 2.5.12. 
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Similarly, we can see that each pixel intensity satisfies the form shown in Figure 2.19 for the chosen 
values of a, b, and c. (Try a few yourself to be sure you agree.) Thus V is a subspace of the space of 
images with this geometric arrangement of ten pixels. 


This result also means that V is a vector space. 


Example 2.5.13 Is the set V = {ax? + bx +c | a > 0} a vector space? Clearly, V is a subset of the 
vector space (P2, +, -). However, the zero vector of P2, 0(x) = Ox? + Ox +0, is not in V. So, by 
Theorem 2.5.3, V is not a vector space. 


For the next example, let us first define the transpose of a matrix. 


Definition 2.5.14 


Let A be the r x c matrix given by 


Q\1 412 .-- Ale 
a21 422 ..- Ad¢ 


Gr| Ar2 «++ Arc 
Then, the transpose of A, denoted A’, is the c x r matrix that is given by 


a1 421... G1 
a2 422... Ay2 


Qc A2c --- Arc 


In words, A! is the matrix whose rows are made from the columns of A and whose columns are 
made from the rows of A. 


Example 2.5.15 Let M = {u € M2,2(R) | u = u'}. That is, M is the set of all 2 x 2 matrices with 
real-valued entries that are equal to their own transpose. The set M is a subspace of M2,.2(R). 


Proof. Let M be defined as above. First, we note that M is a subset of the vector space M2x2(R), 
with the standard operations. We will show that M satisfies Theorem 2.5.3. Notice that M contains the 
zero vector of M2,.2(R), the two by two matrix of all zeros. Next, consider two arbitrary vectors u 
and v in M and arbitrary scalar a in R. We can show that au + v = (au + v)' so that au + v is also 


in M. Indeed, let 
_ [ab _ [ef 
v= (44) and v= (£4). 


Then since u = u' and v = v', we know that b = c and f = g. Thus 
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_ fab ef 
au+tuvu=a ba or fh 


aa+eab+ f 
ab+ fad+h 


_ (oateab+ f . 
~ \ab+ f ad+h 


= (ou +v)!. 


Thus, M is a subspace of M2 2(R). 


2.5.3 Subspaces of IR” 


In Section 2.3, we introduced the n-dimensional Euclidean spaces R”. One nice aspect of these spaces 
is that they lend themselves to visualization: the space R = R! just looks like a number line, R? is 
visualized as the xy-plane, and R? can be represented with x-, y-, and z-axes as pictured in Figure 2.21. 
For larger n, the space is harder to visualize, because we would have to add more axes and the world 
we live in is very much like R3. As we continue our study of linear algebra, you will start to gain more 
intuition about how to think about these higher dimensional spaces. 


Fig. 2.21 Geometric representations of R! (Top), R? (Middle), and R? (Bottom). 
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In the last section we looked at a few examples of subspaces of R”. In this section we explore the 
geometry of such subspaces. 


Theorem 2.5.16 
Let L be a line through the origin in R”. Then L is a subspace of (R”, +, -). 


Proof. A line through the origin in R” is represented as the set L = {Gv | 3 € R}. We show L isa 
subspace using the property given in Corollary 2.5.6. Let w; = $,v and w2 = {2v be two vectors in 
L and let a € R be a scalar. Then 


aw, + w2 = afiv + Gov = (a P| + fo)v, 


and so L is closed under addition and scalar multiplication. 
Also, notice that L is nonempty. Thus, by Theorem 2.5.3, L is a subspace of (R”, +, -). 


As we explore higher dimensions, we find that planes through the origin (and in fact any geometric 
space described by the solutions of a homogeneous system of linear equations) are also vector spaces. 
We define a plane as the set of all linear combinations of two (not colinear) vectors in R?. We leave it 
as Exercise 10 to write the proof of this fact as it follows, closely, the proof of Theorem 2.5.16. 


2.5.4 Building New Subspaces 


In this section, we investigate the question, “If we start with two subspaces of the same vector space, 
in what ways can we combine the vectors to obtain another subspace?” Our investigation will lead us 
to some observations that will simplify some previous examples and give us new tools for proving that 
subsets are subspaces. We first consider intersections and unions. 


Definition 2.5.17 
Let S and T be sets. 


e The intersection of S and 7, written SM T, is the set containing all elements that are in both 
Sand T. 

e The union of S and 7, written S U T, is the set containing all elements that are in either S or 
T (or both). 


The intersection of two subspaces is a also a subspace. 


Theorem 2.5.18 
Let W; and W> be any two subspaces of a vector space (V, +, -). Then W;  W2 is a subspace 
of (V,+,-). 
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Proof. Let W, and W2 be subspaces of (V, +, -). We will show that the intersection of W; and W2 
is nonempty and closed under scalar multiplication and vector addition. To show that W) M W2 is 
nonempty, we notice that since both W; and W2 contain the zero vector, so does W; N W2. 

Now, let u and v be elements of W; N W2 and let a be a scalar. Since W; and W2 are closed under 
addition and scalar multiplication, we know that a - u + vis also in both W; and W>. Thatis,a-u+uv 
is in Wj N Wo, so by Corollary 2.5.6 W; NM Wo is closed under addition and scalar multiplication. 

Thus, by Corollary 2.5.4, W; M W2 is a subspace of (V, +, -). 


An important example involves solutions to homogeneous equations, which we first considered in 
Example 2.4.12. 


Example 2.5.19 The solution set of a single homogeneous equation in n variables is a subspace of R” 
(see Example 2.4.12). By Theorem 2.5.18, the intersection of the solution sets of any k homogeneous 
equations in n variables is also subspace of R”. 


In other words, if a system of linear equations consists only of homogeneous equations, then the set 
of solutions forms a subspace of R”. This is such an important result that we promote it from example 
to theorem. 


Theorem 2.5.20 
The solution set of a homogeneous system of equations in n variables is a subspace of R”. 


The proof of this theorem closely follows the argument of Example 2.5.19 and is left to the reader. 
Theorem 2.5.18 also provides a new method for determining if a subset is a subspace. If the subset can 
be expressed as the intersection of known subspaces, then it also is a subspace. 


Example 2.5.21 Consider the vector space of functions defined on R, denoted F. We have shown that 
the set of even functions on R and the set of continuous functions on R are both subspaces. Thus, by 
Theorem 2.5.18, the set of continuous even functions on R is also a subspace of F. 


The union of subspaces, on the other hand, need not be a subspace. Recall Example 2.5.10, where 
we showed that the coordinate axes in R? are not a subspace. Because the coordinate axes can be 
written as the union of the x-axis and the y-axis, both of which are subspaces of R?, it is clear that 
unions of subspaces need not be subspaces. 

In fact, the union of subspaces is generally not a subspace. The next theorem tells us that the union 
of subspaces is a subspace only when one subspace is a subset of the other. 


Theorem 2.5.22 
Let W, and W2 be any two subspaces of a vector space V. Then W; U Wy) is a subspace of V if 
and only if W; C W2 or W2 C W. 
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Proof. Suppose W, and W2 are subspaces of a vector space V. 

(<=) First, suppose that W; C W2. Then, we know that Wi) U W2 = W2 which is a subspace. 
Similarly, if W2 C W, then W, U W2 = W, which is a subspace. 

(=>) We will show the contrapositive. That is, we assume that neither subspace (W nor W2) contains 
the other. We will show that the union, W; U W9, is not a subspace. Since W; g W. and W2 g Wi, we 
know that there is a vector uj € W,, with u; ¢ W2 and there is another vector uz € W2, withu2 ¢ Wj. 
Since W, and W2 both contain additive inverses and both are closed, uw; + u2 ¢ W; andu; + u2 ¢ Wo. 
So, v1, U2 € Wi U Wo, but uy + u2 € Wi U Wo. Thus, W; U W2 is not closed under addition and 
hence is not a subspace of V. 


Another way to combine sets is to take the so-called sum of the sets. In words, the sum of two sets 
is the set formed by adding all combinations of pairs of elements, one from each set. We define this 
more rigorously in the following definition. 


Definition 2.5.23 


Let U,; and U> be sets. We define the sum of U; and U2, denoted U; + U2, to be the set 
{ui + u2 | uy € U1, u2 € Ud}. 


In order to add two sets, they need not be the same size and the sum of two sets is a set that can be 
(but is not necessarily) strictly larger than each of the two summand sets. This last comment can be 
seen in Example 2.5.24. 


Example 2.5.24 Let U; = {3, 4, 5}, and U> = {1, 3}. Then 


U; + U2 ={uy + u2 | uy € Uj, uz € U2} 
={3+1,34+3,44+1,44+3,5+1,5+4+ 3} 
= {4,5, 6, 7, 8}. 


Example 2.5.25 Let U) be the set of all scalar multiples of image A (see page 11). Similarly, let U2 
be the set of all scalar multiples of image B. We can show that U; + U2 is a subspace of the vector 
space of 4 x 4 grayscale images. (See Exercise 24.) 


In both of the previous two examples we see that the sum of two sets can contain elements that are 
not elements of either set. This means that the sum of two sets is not necessarily equal to the union of 
the two sets. In Example 2.5.25 the sets U; and U2 are both subspaces and their sum U; + U2 is also 
a subspace. This leads us to consider whether this is always true. We see, in the next theorem, that the 
answer is yes. 


Theorem 2.5.26 
Let W, and W2 be subspaces of a vector space (V, +, -). Then W; + W2 is asubspace of (V, +, -). 


Proof. Let S = W,; + W2. We will show that S contains the zero vector and is closed under addition 
and scalar multiplication. 
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Since W, and W2 are a subspaces of (V, +, -), both contain the zero vector. Now, 0 being the additive 
identity, gives usO=0+0€ S. 

Let x = xj + x2 and y = y,; + yo be arbitrary elements of S with x1, y) € W, and x2, y2 € Wo, 
and let a be an arbitrary scalar. Notice that, since ax; + y; € W; and ax2 + y2 € Wo, 


ax +y = (ax; + y1) + (ax2 + y2) € S. 


Thus, by Corollary 2.5.6, S is also closed under scalar multiplication and vector addition. By 
Theorem 2.5.3, S is a subspace of V. 


In words, Theorem 2.5.26 tells us that the sum of subspaces always results in a subspace. Of special 
interest are subspaces of the type considered in Example 2.5.25. That is, we are interested in the sum 
U, + U2 for which U; N U2 = {0}. 


Definition 2.5.27 


Let U; and U2 be subspaces of vector space (V, +, -) such that U; N U2 = {0} and V = U; + U2. 
We say that V is the direct sum of U; and U2 and denote this by V = U; @ U2. 


Because the direct sum is a special case of the sum of two sets, we know that the direct sum of two 
subspaces is also a subspace. 


Example 2.5.28 Let U; = {(a, 0) € R? | a € R} and U2 = {(0, b) € R? | b € R}. U can be repre- 
sented by the set of all vectors along the x-axis and U2 can be represented by the set of all vectors along 
the y-axis. Both U; and U2 are subspaces of R2, with the standard operations. And, U; ® U2 = R?. 


Example 2.5.29 Consider the sets 


U, = {ax” € Px(R) |a € R}, 
U2 = {ax € P2(R) |ae R}, 
U3 = {a € P2(R) |a ER}. 


Notice that each subset is a subspace of (P2(R), +, -). Notice also that P2(R) = U; + U2 + U3. 
Furthermore, each pair of subsets has the trivial intersection {0}. Thus, we have P2(R) = (U; ® U2) ® 
U3. We typically write P2(R) = U; © U2 © U3 with the same understanding. 


Examples 2.5.28 and 2.5.29 suggest that there might be a way to break apart a vector space into the 
direct sum of subspaces. This fact will become very useful in Chapter 7. 


2.5.5 Exercises 


In the following exercises, whenever discussing a vector space (V, +, -) where + and - are the standard 
operations or clear from the context, we will simplify notation and write only V. 


1. Describe or sketch each of the following subsets of R* (for the appropriate values of k). Which 
are subspaces? Show that all conditions for a subspace are satisfied, or if the set is not a subspace, 
give an example demonstrating that one of the conditions fails. 
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(c) $= ( =) rer} cr’ 
la| 
(d) S= a aeR$}cR 
(: 
0 In(2) 
(ec) S=icy] 1 J+oeo 0 c1.02€R$ CR. 
1/2 1 


. Which of these subsets are subspaces of M>,.2(R)? For each that is not, show the condition that 


fails. 


ea a.beR} 
© { (65) a+b=o} 
ee, a+o=s} 


(d) ic. a+b=0,ceRy. 


. Consider the vector space R? and the following subsets W. For each, determine if (i) W is closed 


under vector addition, (ii) W is closed under scalar multiplication, (iii) W is a subspace of R?, and 
(iv.) sketch W in the xy-coordinate plane and illustrate your findings. 


(a) W ={(1, 2), (2, D}. 

(b) W = {(a, 2a) € R*|a ER}. 

(c) W={(Ga-1,2a+1)€R*|aeR}. 
(d) W={(a,b) € R* |ab>0}. 


. Recall the space P2 of degree two polynomials (see Example 2.4.2). For what scalar values of b 


is W = {ay +. ayx + ax? | dg + 2a, + a2 = b} a subspace of P2? 


. Recall the space 7, of all bar graphs with 11 bins (see Definition 2.4.16). We are interested in 


the bar graphs that have at most one bar with a value higher than both, or lower than both, of two 
nearest neighbors. Notice that end bars don’t have two nearest neighbors (see Figure 2.22). Call 
this subset G. 


(a) Is G a subspace of H? 

(b) We typically expect grade distributions to follow a bell curve (thus having a histogram that is 
a bar graph in G). What does your conclusion in part 5a say about a course grade distribution 
if homework, lab, and exam grade distributions are elements of G? 


. What is the smallest subspace of R* containing the following vectors: 


25 


12. 


13. 


14. 
15. 
16. 
17. 


18. 
19. 


20. 


21. 
22. 
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1 0 
0 1 
1}]°|0 
0 0 


Explain your answer using ideas from this section. 

. Use the discussion about subspaces described by linear equations with 2 variables to discuss the 
different types of solution sets of a homogeneous system of equations with 2 equations and 2 
variables. Include sketches of the lines corresponding to the equations. 

. Use the discussion about subspaces described by linear equations with 3 variables to discuss the 
different types of solution sets of a homogeneous system of equations with 3 equations and 3 
variables. Include sketches of the planes corresponding to the equations. 

. Is R* a subspace of R*? Explain. 

. Show that if a plane in R? contains the origin, then the set of all points on the plane form a vector 
space. 

. Prove: Let v € R”, and suppose that V is a subspace of (R”, +, -) containing v. Then the line 

containing the origin and v is contained in V. 

Show that the set of all arithmetic combinations of images A, B, and C (page 11) is a subspace of 

the vector space of 4 x 4 images. 

A manufacturing company uses a process called diffusion welding to adjoin several smaller rods 

into a single longer rod. The diffusion welding process leaves the final rod heated to various 

temperatures along the rod with the ends of the rod having the same temperature. Every a cm 
along the rod, a machine records the temperature difference from the temperature at the ends to 

get an array of temperatures called a heat state. (See Section 2.4.1.) 


(a) Plot the heat state given below (let the horizontal axis represent distance from the left end of 
the rod and the vertical axis represent the temperature difference from the ends) 


u = (0, 1, 13, 14, 12,5, —2, —11, —3, 1, 10, 11, 9, 7, 0). 


(b) How long is the rod represented by u, the above heat state, if a = 1 cm? 

(c) Give another example of a heat state for the same rod, sampled in the same locations. 
(d) Show that the set of all heat states, for this rod, is a vector space. 

(e) What are the distinguishing properties of vectors in this vector space? 


Show that a subset of a vector space satisfies (P9) as long as (P2) and (P8) are satisfied. [Hint: 
Theorem 2.3.17 may be useful.] 

Prove Corollary 2.5.4. 

Prove Corollary 2.5.6. 

Prove that the set of all differentiable real-valued functions on R is a subspace of F. See 
Example 2.4.1. 

Prove that the empty set, J, is not a subspace of any vector space. 

Prove that the set containing only the zero vector of a vector space V is a subspace of V. How 
many different ways can you prove this? 

The smallest subspace in R” containing two vectors u and v that do not lie on the same line through 
the origin is the plane containing v, w, and 0. What if a subspace V of (R?, +, -) contains a plane 
through the origin and another line L not in the plane. What can you say about the subspace? 
Give an example that shows that the union of subspaces is not necessarily a subspace. 

Consider the sets 
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“ULI 


Fig.2.22 Examples of two histograms in G (left and middle) and an example of a histogram not in G (right). 


Fig.2.23 Left image: C), Right image: C2. Here black has a pixel intensity of 0 and white has a pixel intensity of 1. 


Wi = fax? + bx +e€ Po(R)|a=c}, 


Wo = fax? + bx +c € Pr(R) |a = 2cl. 


Show that both are subspaces of P2. Are W; 1 W2 and Wj U W2 subspaces of P2? 
23. Prove that if U; and U2 are subspaces of V such that V = U; ® U2, then for each v € V there 
exist unique vectors uy € Uj; and u2 € U2 such that v = wy + U2. 


For exercises 24 through 26, let V be the set of 4 x 4 grayscale images and consider the example 
images on page | 1. Let Ug be the set of all scalar multiples of image Q. 


24. Complete Example 2.5.25 by showing, without Theorem 2.5.26, that U; + U2 is a subspace of 
the vector space of 4 x 4 grayscale images. 

25. Which of images 1, 2, 3, and 4 are in U4 + Up + Uc? 

26. Show that V 4 U4, ® Ug @ Uc. 


For exercises 27 through 31, let V = 756256, the set of 256 x 256 grayscale images and consider 
the two images in Figure 2.23. Let W, be the set of all scalar multiples of image C; and W2 be 
the set of all scalar multiples of image C2. Recall that although the display range (or brightness 
in the displayed images) may be limited, the actual pixel intensities are not. 


27. Describe the elements in W;. Using the definition of scalar multiplication on images and vector 
addition on images, describe how you know that W is a subspace of V. 
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28. Describe the elements in W; + W2. Is W; + W2 a subspace of V? Explain. 
29. Describe the elements in Wj U Wo. Is Wi U W2 a subspace of V? Explain. 
30. Describe the elements in W; 1 Wo. Is Wi N W2 a subspace of V? Explain. 
31. Describe how you know that V 4 W; @ W2. 


For Exercises 32 to 35, let V denote the space of images with hexagonal grid configuration shown 
here: 


32. Let H be the subset of V consisting of all images whose outer ring of pixels all have value zero. 
Is H a subspace of V? 

33. Let K be the subset of V consisting of all images whose center pixel has value zero. Is K a 
subspace of V? 

34. What images are included in the set HM K? Is HM K asubspace of V? 

35. What images are included in the set H U K? Is H U K asubspace of V? 


1 


Check for 
updates 


Vector Space Arithmetic and 
Representations 


We have seen that it is often quite useful in practice to be able to write a vector as a combination of scalar 
products and sums of other specified vectors. For example, recall again the task from Section 2.1, where 
we considered the 4 x 4 images on page 11. We showed that Image 2 could be formed by performing 
image addition and scalar multiplication on Images A, B, and C. In particular, we found that 


=e +0 +1 
=(5 . 


Image 2 Image A Image B Image C 


On the other hand, we found that Image 4 could not be formed using any combination of image addition 
and scalar multiplication with Images A, B, and C. That is, there are no scalars a, 3, and y so that 


i a 


Image 4 Image A Image B Image C 


In this chapter, we will explore vector spaces and subspaces of vector spaces that can be described 
using linear combinations of particular vectors. We will also discuss optimal sets of vectors that can 
be used to describe a vector space. Finally, we will discuss a more concrete way to represent vectors 
in abstract vector spaces. 


3.1. Linear Combinations 


Suppose we are working within a subspace for which all the radiographs satisfy a particular (significant) 
property, call it property S. This means that the subspace is defined as the set of all radiographs with 
property S. Because the subspace is not trivial (that is, it contains more than just the zero radiograph) 
it consists of an infinite number of radiographs. Suppose also that we have a handful of radiographs 
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that we know are in this subspace, but then a colleague brings us a new radiograph, 7, one with which 
we have no experience and the colleague needs to know whether r has property S. Since the set of 
radiographs defined by property S is a subspace, we can perform a quick check to see if r can be formed 
from those radiographs with which we are familiar, using arithmetic operations. If we find the answer 
to this question is “yes,” then we know r has property S. We know this because subspaces are closed 
under scalar multiplication and vector addition. If we find the answer to be “no,” we still have more 
work to do. We cannot yet conclude whether or not r has property S because there may be radiographs 
with property S that are still unknown to us. 

We have also been exploring one-dimensional heat states on a finite interval. We have seen that 
the subset of heat states with fixed (zero) endpoint temperature differential is a subspace of the vector 
space of heat states. The collection of vectors in this subspace is relatively easy to identify: finite- 
valued and zero at the ends. However, if a particular heat state on a rod could cause issues with future 
functioning of a diffusion welder, an engineer might be interested in whether the subspace of possible 
heat states might contain this detrimental heat state. We may wish to determine if one such heat state 
is an arithmetic combination of several others. 

In Section 3.1.1, we introduce the terminology of linear combinations for describing when a 
vector can be formed from a finite number of arithmetic operations on a specified set of vectors. 
In Sections 3.1.3 and 3.1.4 we consider linear combinations of vectors in Euclidean space (R”) and 
connect such linear combinations to the inhomogeneous and homogeneous matrix equations Ax = b 
and Ax = 0, respectively. Finally, in Section 3.1.5, we discuss the connection between inhomogeneous 
and homogeneous systems. 


3.1.1. Linear Combinations 


We now assign terminology to describe vectors that have been created from (a finite number of) 
arithmetic operations with a specified set of vectors. 


Definition 3.1.1 


Let (V, +, -) be a vector space over F. Given a finite set of vectors vj, v2,--- , ux € V, we say that 
the vector w € V is a linear combination of v1, v2,--- , vg if w = ayvy + agv2 + +++ + agug for 
some scalar coefficients aj, a2,--- ,ax € F. 


Corollary 2.5.4 says that a subspace is a nonempty subset of a vector space that is closed under 
scalar multiplication and vector addition. Using this new terminology, we can say that a subspace is 
closed under linear combinations. 

Following is an example in the vector space Z4,.4 of 4 x 4 images. 


Example 3.1.2 Consider the 4 x 4 grayscale images from page | 1. Image 2 is a linear combination 
of Images A,B and C with scalar coefficients 5 0, and 1, respectively, because 


1 
—{_). : 1 ; 


Image 2 Image A Image B Image C 
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xx Watch Your Language! When communicating whether or not a vector can be written as a linear 
combination of other vectors, you should recognize that the term “linear combination” is a property 
applied to vectors, not sets. So, we make statements such as 


VY wisa linear combination of V1, U2, U3 
Y  wisnota linear combination of v1, U2,U3,...Un. 
VY wisalinear combination of vectors in U = {v1, v2, v3}. 


We do not say 


X wisalinear combination of U. 


In the remainder of this section, we focus on the question: Can a given vector be written as a linear 
combination of vectors from some specified set of vectors? 

In Section 2.2, we set up systems of linear equations to answer such questions. For example, on 
page 14, we considered whether Image C is a linear combination of Images A, D, and F. To determine 
the answer to our inquiry, we solved corresponding system of equations and found that, indeed, it is. 
Also, in Section 2.2, we asked whether it is possible to find scalars a, 3, and y so that 


= Gey Go 


Image A Image D Image E Image F 


In this case, we used systems of equations to determine that no such scalars exist and so Image A is 
not a linear combination of Images D, E, and F. 


Example 3.1.3 This example illustrates a subtle, yet important, property of linear combinations, the 


reason we require them to be finite sums. Consider the set T = {1, x, x7, x30. J} whose vectors are 


in the vector space of all polynomials. We know, from Calculus, that the Taylor series for cosine is 


oo k V.2k 2, 4 6 
(—1)*x x x x 
uae Or a a ee 


According to the definition of linear combination, this series is not a linear combination of the vectors 
in T because it does not consist of only finitely many (nonzero) terms. We also know by definition 
that every polynomial can be written as a linear combination of the vectors in T. Since cos x cannot 
be written in this way! , we have no trouble saying that cos x is not a polynomial. 


If we can write a vector, v, as an arithmetic combination of infinitely many vectors, u1, v2, ... like 
CO 
v= ~ axux, for some scalars ag, 
k=0 


'Do you see why? Hint: how many extrema does a polynomial have? 
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we may still be able to write v as the arithmetic combination of only finitely many vectors, like 


N 
v= S byux, for some scalars bz. 
k=0 
In this case, v is a linear combination of uo, “1, ..., un. 


xx Watch Your Language! Remember that linear combinations can only have finitely many terms. Let S be 
the infinite set {u,, u2, v3, ...}.We can say 


¥Y Since v = 3u, + 5uy2 — 72u35, v is a linear combination of v1, u12, 435, 
CO 


Vive Yo acu. with a, 4 0 fork = 1, 12, 35,then v is a linear combination of w1, u42, u35. 
k=1 
It is incorrect to say 
CO 
X ifu= agug, then v is a linear combination of the elements of S. 
k=1 


We finish this section with additional examples where we reformulate and answer the linear 
combination question using systems of linear equations. 


Example 3.1.4 Consider the vector space R* with the standard operations and field. Can the vector 
w be written as a linear combination of the vectors v1, v2, v3, where 


2 1 2 3 
w=]5],u=]0], w= ]2], 3.= 4714? 
3 1 1 4 


Equivalently, do coefficients x1, x2, x3 exist so that w = x, v2 + x2V2 + x33? More explicitly, we 
seek coefficients such that 


2 1 2 3 
Sp =x) OO) +x.) 2} 4x3] 1]. (3.1) 
3 1 1 4 


Because scalar multiplication and vector addition are defined element-wise, we have the equivalent set 
of conditions 


2= x, + 2x2 + 3x3 
5 = 2x. + x3 (3.2) 
3 =x, + x2 + 4x3. 


So, our original question: Is it possible to write w as a linear combination of the vectors v1, v2, v3? 
can be reformulated as the question of whether or not the system of equations (3.2) has a solution. We 
write and row reduce the associated augmented matrix: 


1o3io 10 0|-23/3 
02115] ~fo10] 4/3), (3.3) 
114]3 001} 7/3 
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and see that unique coefficients for the desired linear combination are x; = —23/3, x2 = 4/3, and 
x3 = 7/3. 


We now consider a similar example in the vector space of degree 2 polynomials P2. 


Example 3.1.5 Consider the following vectors in P2: 


vy) = 3x44, =2x 41,43 =2x7 42, and v4 = x?. 


The polynomial v; can be written as a linear combination of the polynomials v2, v3, v4 if there exist 
scalars a, 3, and ¥ so that 
vy = av2 + Bv3 + yu4. 


If such scalars exist, then 
3x +4=a(2x +1) + BOX? +2) 4+ 7(x?). 
We match up like terms and obtain the following system of equations: 
(x? term) 0= B+y 
(x term) 3 = 2a 


(constant term) 4= a+ 2(. 


Again, we can turn this system into an augmented matrix and row reduce: 


01 1/0 100) 3/2 
200/33] ~ 1010) 5/4]. (3.4) 
120/4 00 1/—5/4 
The solution to this system, then, is a = 3, p= 3, and y = -}. In other words, 
VU= an + -—v Vv 
1= 5 2 ri 3 4 4, 


and so v; can be written as a linear combination of v2, v3, and v4. 
We give a final example, this time in the vector space D(Z2) of 7-bar LCD images. 


Example 3.1.6 Recall Example 2.4.17, in which we saw the following two examples of linear 
combinations: 
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= aa 
In words, these say that the LCD character ert is a linear combination of 4 and = and *: I is 


a linear combination of 22", * 4 ,and “ems a . The interesting part of this example is that the scalar set 
= 
lt 
for D(Z2) has only two elements. So, if we want to determine whether the “9” digit, _! , 1S a linear 
combination of the other 9 digits, we ask whether there are scalars ag, @1,..., ag € {0, 1} so that 
=> 


: : : -™ — J 1 —: — ™, =, 
Qo : tay = r +a9 2 a nas A a he mie 


+a +a +g = ™,. 
I om) fi hel 


We can then set up a system of equations so that there’s an equation for each bar. For example, the 
equation representing the top bar is 


az2+a3+as5+ag+a7+ag = l. 


Exercise 21 continues this investigation. 


3.1.2 Matrix Products 


In this section, we state definitions and give examples of matrix products. We expect that most linear 
algebra students have already learned to multiply matrices, but realize that some students may need 
a reminder. As with R, Minxn can be equipped with other operations (besides addition and scalar 
multiplication) with which it is associated, namely matrix multiplication. But unlike matrix addition 
the matrix product is not defined component-wise. 


Definition 3.1.7 


Given an n x m matrix A = (q;,;) and anm x € matrix B = (b;,;), we define the matrix product 
of A and B to be the n x € matrix AB = (c;,;) where 


m 
Ci.j = >: Qj,KD, j- 
k=1 
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We call this operation matrix multiplication. 


Notation. There is no “.” between the matrices, rather they are written in juxtaposition to show a 
difference between the notation of a scalar product and the notation of a matrix product. The definition 
requires that the number of columns of A is the same as the number of rows of B for the product AB. 


Example 3.1.8 Let 


i2 > 4 
P=134), o=[1-1 and R= (7 _3). 
56 2 4 


Since both P and Q are 3 x 2 matrices, we see that the number of columns of P is not the same as the 
number of rows of Q. Thus, PQ is not defined. But, since P has 2 columns and R has 2 rows, PR is 
defined. Let’s compute the matrix product PR. We can compute each entry as in Definition 3.1.7. 


Position Computation 


Gf) piri, + pi2r2,; 


(ty 1249.7 
(1,2) 1-0+2-(-2) 
(Ay BofA 
(2,2) 3-0+4-(-2) 
Gly S346 
G6.) S2046+(-9) 


Typically, when writing this out, we write it as 


12 


pr=(34\ (2°) 
5 6 


PIERO 1040.9) 
=| 3.94 Soted (2) 
534621 S00 G+(=2) 


4 -4 
=|10 -8 
16 —12 


In the above example, the result of the matrix product was a matrix of the same size as P. Let’s do 
another example to show that this is not always the case. 


Example 3.1.9 Let 
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We see that A is 2 x 3 and B is 3 x 4. So the product AB is defined and is a 2 x 4 matrix. Following 
the definition of matrix multiplication, we find AB below. 


1 02 
ap=(5 91) 1-1 11 


In Example 3.1.8, RP is not defined, so RP A PR. Similarly, in Example 3.1.9, BA # AB. But, 
one might wonder whether matrix multiplication is commutative when the matrix product is defined 
both ways. That is, we might wonder if AB = BA when both AB and BA are defined, i.e., when both 
A and B aren x n (square) matrices for some n. Let us explore this in the next example. 


12 2 0 
es) and B= (2,5). 


Let us compute both AB and BA. 


Example 3.1.10 Let 


1-3 20 
ap=(3_1)(33) 
_ 1-2+2-(-2) 1-0+4+2-3 
Ne 2 ta lio(—oy S404 (123 
_({-2 6 
ON A a 
20\f/1 2 
eo) 
_f{ 2140-3 2-24+0-(-1) 
~\=2-143-3  -—2-24+3-(-1) 


Even in the simple case of 2 x 2 matrices, matrix multiplication is not commutative. 


Because matrix multiplication is not generally defined between two matrices in M,)..), we can see 
that M nxn is not, in general, closed under matrix multiplication. Since the matrix product (for brevity, 
let’s call it ®) is not commutative, we can also see that (Mn xn, ®, +) is not a vector space over R. 

A special case of matrix multiplication is the computation of a matrix-vector product Mv, where 
M isin M xn and v is a vector in R”. Here we show an example of such a product. 


“ae 3) and v = 1 |. The product is 
2 


Example 3.1.11 Let M = (=; 13 
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-1 


—-103 
mv = ( ) 1 
—213 > 


_ f-1--14+0-14+3-2 
~ \-2--141-14+3-2 


Let us consider one more example that will connect matrix-vector products to systems of equations. 


Example 3.1.12 Let 


23 0 x 
A=j|1-1-1] andX=|y 
04 2 Zz 


Since A is 3 x 3 and X is 3 x 1, we can multiply to get AX. Remember that the number of columns 
of A must match the number of rows of X in order for AX to make sense. The product AX must be a 
3 x | matrix. Multiplying gives 


23 0 x 
AX=]1-1-1 y 
04 2 Zz 


2-x+ 3-y + 0-2 
=]1l-x+(-1)-y+(-1)-z 
O-x+ 4-y + 2-2 


First, notice that the resulting vector looks very much like the variable side of a system of equations. 
In fact, if we consider the question, “Are there real numbers x, y, and z so that 


2x + 3y = 1 
= yorv=3? 
4y+2z=-4 


We see that this identical to asking the question, “Given the matrix 


23 0 
A=]|]1-1-1], 
04 2 
1 x 
and vectorb = | 3 | is there a vector X = | y ] sothat AX = b? 
—4 z 


Although matrix product cannot play the role of vector space addition, this operation does play a 
major role in important linear algebra concepts. In fact, we will begin weaving matrix products into 
linear algebra in the next section. 


92 3 Vector Space Arithmetic and Representations 
3.1.3 The Matrix Equation Ax = b 


In Section 3.1.1, we used systems of equations and augmented matrices to answer questions about linear 
combinations. In this section, we will consider the link between linear combinations and matrix-vector 
products. 

We can use the ideas from Section 3.1.2 about matrix-vector products to rewrite a system of equations 
as a matrix equation. To formalize, we define, in a way similar to images, what it means for two matrices 
to be equal. 


Definition 3.1.13 


Given two matrices A = (q;,;) and B = (b;,;), we say that A and B are equal, and write A = B, 
if and only if A and B have the same dimensions and a;,; = b;,; for every pair i, j. That is, 
corresponding entries of A and B are equal. 


We now use this definition to rewrite a system of equations. 


Example 3.1.14 Consider the system of linear equations given by 


2a+3b- c= 2 
a-—-2b+ c= 1 
a+5b—2c=1. 


This system is equivalent to the following vector equation 


2a+3b-—c¢ 2. 
a—-2b+c]= 11 
a +5b— 2c 1 


Finally, we can rewrite this vector equation using the matrix-vector product: 


23 -1 a 2 
1-2 1 bj/=]1 
1 5 —2 c 1 


Au =v, 
where 
a 2 
u=|b]}], v=]1], 
Cc 1 


and A is the corresponding coefficient matrix 


23 -1 
1-2 1 
15 -2 
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These observations allow us to connect solutions to systems of equations with solutions to matrix 
equations. The following theorem makes this connection precise. 


Theorem 3.1.15 
Let A be an m x n matrix and let b € R”. The matrix equation Ax = b has the same solution 
set as the system of equations with augmented coefficient matrix [A | b]. 


ay 
a2 
In this theorem, we are not distinguishing between the point (a1, a2, ..., @,) and the vector 


An 
for the solution to a system of equations as the difference is only notational. 


Proof. In this proof, we will show that if u is a solution to Ax = b then uw is also a solution to the 
system of equations corresponding to [A|b]. Then, we will show that if u is a solution to the system of 


1,1 41,2 +++ Ain 
: ; ; ; a2,1 42,2 +++ A2n 
equations corresponding to [A|b], then uw is also a solution to Ax = b. Let A = . . 
Am,1 Gm,2 °** Gm,n 
be an m x n matrix and let 
by 
b2 
b=] . |.€R” 
Din 
Q) 
a2 
Suppose, also, thatu = | , | € R” isa solution to the matrix equation Ax = b. Then, 
On 
Q1,1 1,2 °** Ain ay by 
a2,1 2,2 +++ A2.n a2 b2 
Gm,1 4m,2 °** Amn An bm 


Multiplying the left-hand side gives the vector equation 


ay jay + 41202 +:++ + ann by 
a2,101 + 42,202 +++: + a2 nQn bo 


Am,101 + Am,202 + +++ + Am nn bm 
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Thus, 
a,1Q1 + 4,202 + +++ + a, nn = Dd, 
a2,1;0) + a2,202 + +++ + a2 nan = bo 
Am 11 + Gm,2@2 + +++ + Am nn = bm. 
Therefore, (a1, @2,..., Qn) 18 a solution to the system of equations 


Ay 1X1 + Ay2x2 + +++ + A nXy = dy 
a2,1X1 + a2,2X2 + +++ + a2nXn = bo 
Am 1X1 + Am,2x2 + +++ + Am nXn = bm. 
Now, assume that the system of equations, with coefficient matrix [A|b] has a solution. Then there are 
ay 
a2 


values a1, Q2,...,Q@, sothatu =] . | is asolution to the system of equations 


An 


aX} + a, 2X%2 +... + A, nXn = bj 
42,1X1 + a2,2X2 +... + 2 nXn = bo 


Am1X1 + Am,2xX2 +... + Am nXn = bm. 


Then putting in uw, we have 


a1101; + ay2Q02 +:+++ a, nn = dy 
a2,10) + d2202 +::-+ da nA, = bo 


Am 101 + Gm,2@2 + +++ + An nn = bm. 


This gives us that 


Q1,1 4,2 °+* An ay by 
a2.) 42,2 +++ A2,.n a2 bo 
Am,1 4m,2 °** Amn Qn bmn 


Therefore Au = b. Thus, u is a solution to Ax = b. 


An important consequence of Theorem 3.1.15 is the following corollary. 


Corollary 3.1.16 
Let A be an m x n matrix and let b € R”. The matrix equation Ax = b has a unique solution if 
and only if the corresponding system of equations has a unique solution. 
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In the proof of Theorem 3.1.15 we see that every solution to the system is also a solution to the 
matrix equation and every solution of the matrix equation is a solution to the corresponding system. 
So, the result in Corollary 3.1.16 follows directly from the proof of Theorem 3.1.15. We only state it 
separately because it is a very important result. 

What does all this have to do with linear combinations? Consider again Example 3.1.4. 


2 
Example 3.1.17 In Example 3.1.4, we asked whether w = | 5 | canbe written as a linear combination 
3 
1 2 3 
vp = 1]O),u.=]2], andv3= | 1 
1 1 4 
We reformulated this question as the system of equations 
2 =x, + 2x2 + 3x3 (3.5) 
5 =2x2 + x3 (3.6) 
3 =x, + x2 + 443. (3.7) 


Based on discussion in this section, we can rewrite this system of equations as a matrix equation 
Ax = b, where 
1.2.3 2 
A=|021] andb=]5 
114 3 


If we look closely at A, we see that the columns of A are v1, v2, and v3. Moreover, if x is a solution 
to the equation Ax = b, then 


123 x1 1 2 3 2 
021 x2) =x, /O]}4+x20/2]}4+23]/1] = ]5 
114 X3 1 1 4 3 


In other words, if x is a solution to the equation Ax = b, then b is a linear combination of v;, v2, and 
v3 with coefficients equal to x;, x2, and x3. 


There was nothing special about the vectors v;, v2, v3 in Example 3.1.4 to make the matrix product 
relate to a linear combination. In fact, another way to represent the product of a matrix with a vector 
is as follows. 


Theorem 3.1.18 
Let A and x be the matrix and vector given by 
Qi 1,2... Ain Qa] 
a2,1 a2,2 ... A2n a2 
= ‘ ; . and x = 


Gm,1 4m,2 +--+ Amn An 
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Then, Ax can be written as a linear combination of the columns of A. In particular, 


Ax =ajcj tage. +... + anen, 


a1 a\,2 ain 
a2,1 a2,2 a2,n 
where c; = . 5 = : Ee 
am,1 Gm ,2 aAm,n 
Proof. Let A and x be defined above. Also, let cj, c2, ... , Cn be the columns of A. Using the definition 


of a matrix-vector product, we have 


Q1,1 41,2 --- Ain Qa) 

42,1 42,2 --- Qn a2 
Ax = 

Gm,1 4m,2 «++ Am,n Qn 


110, +41,202 +... + a1 nQn 
a2,1Q1 + 42,202 +... +42,nQAn 


Am11 + 4m,202 +... + Gm, nn 


a1) a) 2Q) a) nn 
a2,1Q1 a2 ,2Q) a2 nn 
= : + : ee 
Am 11 Am ,202 Am,nQn 
a\,1 a\,2 ain 
a2,1 a2,2 a2.n 
= Qa ‘ + a2 ‘ +... Op 
am,1 am,2 aAm,n 


=Q1C] +a02C2 +... + AnCn. 


Now that we can connect systems of equations, matrix equations, and linear combinations, we can 
also connect the solutions to related questions. Below, we present these connections as a theorem. 


Theorem 3.1.19 
Let A be an m x n matrix and let b € R”. Then the matrix equation Ax = b has a solution if 
and only if the vector b can be written as a linear combination of the columns of A. 
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by 

Proof. Let ci, c2,..., Cn be the columns of A = (a;,;) and let b = . |. To prove these statements 
bm 

are equivalent, we will show the following two statements. (=) If Ax = b has a solution then there 

are scalars a1, Q2,..., @, so that 


b=aycj tagca +... + Any. 


(<) If there are scalars a1, @2,..., @p so that 
b=ayjcy +a2c2 +... + Anen 


then Ax = b has a solution. 


Qa) 
a2 
(=) Suppose Ax = Dbhasasolution. Then, thereisavectoru = | . | sothat Au = b. Then, using the 
An 
connection to matrix-vector products and linear combinations, we know that b = Au = ujc, + 
u2c2 +... +Uncn. Therefore, b is a linear combination of c}, c2,..., Cn. 
(<=) Now, suppose that b is a linear combination of the columns of A. Then there are scalars 
Q1,Q2,..., Ay, So that 
b=aycy t+agcg +... + Ancn 
Q) 
a2 
=A 
An 
Qa 
a2 
Thereforeu = | . | isasolutionto Ax = b. 
An 


In the proof of Theorem 3.1.19 we see that a solution to Ax = b is given by the vector in R” made 
up of the scalars in the linear combination. 
Let us, now, return to another example with vectors in R?. 


1 
Example 3.1.20 Here, we want to determine whether w = | 2] € R® can be written as a linear 
2 
1 1 2 
combination of v; = | 1], v2 = | 3 |], and v3 = | —2 |. Wecan write a vector equation to determine 
1 1 2 
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if there are scalars a, 3, and y so that 


w = av, + Bv2 + v3. 


This leads to 
1 1 1 2 
2)=af,1]4+8)3]4+7]-2 
2 1 1 2 


Matching up entries, we know that a, @ and y must satisfy the following system of equations 


l=a+ 6427 
2=a+368—-27. 
2=a+ 6427 


Using the method of elimination, we get 


l=a+ 6+2y l=a+ 6427 
2=a4+36-2y7 > 1= 4+26-4. 
2=a+ 6427 1=0 


We see that since the last equation is false, this system has no solution and therefore, no scalars a, 
GB, and ¥ exist so that w = av, + Gv2 + yu3. That is, w cannot be written as a linear combination of 
U1, V2 and U3. 


Notice that every time we ask whether a vector w is a linear combination of other vectors 
U1, V2,-.-, Un, We seek to solve an equation like 


QV, + A202 +... + AyUn = W. 


We can ask the very similar question: 
Are there scalars a}, @2,..., @, 3 so that ayv, + a2v2 +... + QyUnp + Bw = 0? 


If we find a solution where 3 4 0, we know that we can solve for w. So, w can be written as a linear 
combination of v1, v2,..., Un. If G must be zero, then we know that w cannot be written as a linear 
combination of the v;’s. In the next section, we study the homogeneous equation, Ax = 0. 
3.1.4 The Matrix Equation Ax = 0 
Recall that in Definition 2.4.11, we defined a linear equation to be homogeneous if it has the form 
aX) +a2x2 +--+ + anxXn = 0 
We now turn our attention to systems of homogeneous equations. 


A system of equations made of only homogeneous equations is, then, called a homogeneous system 
of equations. 
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Example 3.1.21 The system of equations 

2x1 + 3x2 — 2x3 =0 

3x, — x2 + x3 =0 

xy — 2x2 + 2x3 =0 
is a homogeneous system of equations. On the other hand, 

2x, + 3x2 — 2x3 = 0 

3x, — x42 + x3 = 1 


xy — 2x2 + 2x3 =0 


has two homogeneous equations (the first and third), but is not a homogeneous system because the 
second equation is not homogeneous. 


When considering a homogeneous system of equations with n variables and m equations, the 
corresponding matrix equation is Ax = 0, for some m x n matrix A and where 0 represents the vector 
in R” with all zero entries. Notice that there is always a solution to a homogeneous system of equations 
(see Exercise 24). This also means thatif A = (q;, ;) is the coefficient matrix for the system of equations, 
then Ax = 0 always has a solution, we call this common solution the trivial solution. 

When working with homogeneous systems of equations, we are then concerned with how many 
solutions (one or infinitely many) the system has. In this section, we will look at tools to determine the 
number of solutions. 

Recall that we find no solution to the system of equations 


ay 1X1 + Ay2X2 +... + At nXn = dy 
a2,1X1 + a2.2xX2 +... + aanXn = bo 
Am 1X1 + Amj2x2 +... + Am nXn = bm 


after using elimination steps that lead to an equation that is never true. That is, we reduce the system 
until we arrive at a point where one equation looks like 0 = c, where c is not 0. This situation cannot 
possibly occur with a homogeneous system of equations. (See Exercise 26.) Notice that if a system has 
exactly one solution, then when we reduce the system, we arrive at a system of equations of the form 


nal =C| 
x2 =cC 
Xn = Cn 

whose matrix equation is [,x = c, where J, is the n x n matrix given by 


10...0 Cl 
Ol... 
Ih=]|. : and c= 


0... 01 Cn 


100 3 Vector Space Arithmetic and Representations 


If the system of equations corresponding to Ax = b reduces to the system of equations [,x = c, 
then we will see exactly one solution to the system no matter what values were on the right-hand side 
of the system. 

A system of equations with fewer equations than variables has infinitely many solutions or no 
solution. In the case of homogeneous systems of equations with more equations than variable, we can 
immediately say that there are infinitely many solutions to the system without doing any work. 

We will use these last two ideas to consider solutions to the equations Ax = b and Ax = 0. 


Theorem 3.1.22 
Let A be ann x m matrix. The matrix equation Ax = 0 has only the trivial solution if and only 
if the corresponding homogeneous system with coefficient matrix A has only the trivial solution. 


Proof. Let A beann x m matrix. We know, by Exercise 24, that the trivial solution is always a solution 
to Ax = 0. We know that if Ax = 0 has a unique solution, it must be the trivial solution. Applying 
Corollary 3.1.16, we know that Ax = 0 has only the trivial solution if and only if the corresponding 
system has the trivial solution. 


Theorem 3.1.22 characterizes homogeneous systems of equations with a unique (trivial) solution. 
We now consider the set of solutions to a homogeneous matrix equation with infinitely many solutions. 


Example 3.1.23 Let A be the matrix given by 


Ld.=2 
A= {21-1 
43-5 


We can find the set of all solutions to Ax = 0 by solving the corresponding system of equations 


x+ y—2z=0 
2x+ y-— z=0. 
4x + 3y —5z=0 


We can use the method of elimination (or matrix reduction) to find the set of solutions. We will leave 
off the intermediate steps and give the reduced echelon form of the corresponding augmented matrix: 


1 1 —2)0 10 1/0 
21—-1/0}] ~ {01 —3)0 
43 —5|0 00 O;|0 


The corresponding system of equations is 
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This means that we x = —z and y = 3z for any z € R. So, the solution set can be written as 
x = 
S= yl eR |x=-z,y=3z$= 3z }|zeR 
z z 


We claim that S is actually a vector space. Indeed, it is a subspace of R*. This follows either by 
appealing to Theorem 2.5.20 or the direct proof below. 


—2| —Z2 

Proof. Let u,v € S and let a, G € R be scalars. Then, vu = | 3z; | and v = | 3z2 | for some 
Z1 £2 

Z1,Z2 € R. Then 


—Z] —Z2 
a] 3z1 |} +6] 3z2 
Z1 £2 


au + Bu 


—az, — B22 

3az; + 3622 
az, + 822 

—(az1 + 822) 

3(az, + Bz2) J ES. 
(az, + 822) 


Therefore, by Theorems 2.5.3 and 2.5.5, S is a subspace of R?. 


This example leads to the next definition. 


Definition 3.1.24 


The set of solutions to a system of equations (or the corresponding matrix equation) is called a 
solution space whenever it is also a vector space. 


The previous example leads us to ask, “Under what conditions will a solution set be a solution 
space?” We answer the question immediately with the following theorem. 


Theorem 3.1.25 
Let A be an m x n matrix and b € R” . The set of solutions to the matrix equation Ax = b (or 
the corresponding system of equations) is a solution space if and only if b = 0. 


Proof. Let A be anm x n matrix and let b € R” . Define the set of solutions of Ax = b as 


S={veR"| Av=)}. 
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We will show two statements: 


(=) If S is a subspace then b = 0. 
(<) If b = 0 then S is a subspace of R?. 


(=) We will prove this statement by proving the contrapositive (See Appendix C). That is, we will 
prove that if b € R” is not the zero vector then S is not a vector space. Notice that for any m x n 
matrix 

A-0=0F4b. (3.8) 


Therefore, 0 ¢ S. Thus, S is not a vector space. So, if S is a solution space to Ax = b then b = 0. 

(<=) Now, assume b € R” is the zero vector. Then the set S is the solution set of a system of 
homogeneous equations, and hence by Theorem 2.5.20, it is a subspace of R” and therefore a solution 
space. 


In Equation 3.8 above, we use the symbols “0” in two ways. The 0 on the left side of the equation 
is a vector in IR”, but the 0 on the right side of the equation is a vector in R”. 

In the proof of Theorem 3.1.25, we use the fact that there is a distributive property for matrix 
multiplication over vector addition. We also use the property that for any m x n matrix and vector 
x € R” we can write M(ax) = aMx. These two properties are important properties of matrix-vector 
products that we will explore later. 

We can visualize 3-dimensional vectors, so let us consider solution sets in R*. The next examples 
show, visually, the difference between a solution set that is not a solution space and one that is. 

We first compare the solution set to a single inhomogeneous equation in three variables, and the 
solution set to the associated homogeneous equation. 


Example 3.1.26 Consider the two equations 
x+2y-—z=1 


and 
x+2y—z=0. 


We can solve for z in both equations to get 
Z=x+2y-1 


and 
Z=xX+2y. 


Itis apparent that these two solution sets are parallel planes, and that the solution to the homogeneous 
system goes through the origin, while the solution to the inhomogeneous system is shifted down (in 
the negative z-direction) by a distance of 1. 


Let’s look at a slightly more complicated example to get a sense of the patterns that result. 


Example 3.1.27 By Theorem 3.1.25, we know that the solution set to Ax = b, where 


12-1 1 
A=|24 2 and b=] 2], 
36 1 3 
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is not a solution space. Indeed, the solution set is 


S ={x | Ax = 5} 


x 12-1 x 1 
— y 24 2 y}=]2 
Zz 36 1 Zz 3 
x 
= y| |x+t2y—z=1,2x+4y4+ 2z=2, and3x+6y+z=3 


y]/x+2y =landz=0 


| 
tl 


Notice that 0 € R? is not an element of S. Geometrically, the solution set is the line in R? where the 
x x 

planes y||x+2y—z=1>} and y | | 2x +4y 4+ 2z = 2 > intersect. This line does not pass 
Z z 


through the origin. 
Example 3.1.28 Also, from Theorem 3.1.25, we know that the set of solutions to Ax = b, where 
12-1 0 
A=|24 2 and b= | 0], 
36 1 0 
is a vector space. In this case, the solution set is 
S= {| Ax = | 


12=1\ (x 
242]/y]= 
z 


< 
ooo 


NX 


x 
-| ‘ x+2y—z=0,2x+4y+2z=0, and3x+6y+z=0 


Zz 
x 
= y]|x+2y =Oandz=0 
Zz 
x 
Geometrically, the solution space for this example is the line where the planes 4 | y | | x +2y—z =0 
z 


x 
and y | |2x +4y + 2z = 0+ intersect. This line passes through the origin and is parallel to the 


Zz 
line that describes the solution set found in Example 3.1.27. 
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In both cases, the solution sets are parallel to each other, and the homogeneous solution set passes 
through the origin. It turns out that this is always the case. 


3.1.5 The Principle of Superposition 


We saw in Examples 3.1.26, 3.1.27, and 3.1.28 that the solution set of a matrix equation (or its 
corresponding system of equations) is parallel to the solution set of the homogeneous equation (or 
system) with the same coefficient matrix. This is called the principle of superposition. We state it more 
clearly in the next theorem. 


Theorem 3.1.29 (Superposition) 
Let A be an m x n matrix and b € R”. Let x € R” satisfy Ax = b. Then Az = Dif and only if 
z=x-+y forsome y € R” satisfying Ay = 0. 


Proof. Let x, z € R”. Suppose also that Ax = b and Az = b. We will show that x — z is a solution to 
Ay = 0. Indeed, 


A(x — z) =Ax — Az 
=b—b=0. 


Let y = x —z. Then y € R”, Ay = 0, and z= x + y. Now, suppose x, y € R” satisfy Ax = b and 
Ay = 0. Suppose also that z = x + y. Then 


Az=A(x+ y) 
=Ax + Ay 
=0+4 b. 


Therefore Az = b. 


The principle of superposition tells us that if you know one solution to Ax = b, you can find other 
solutions to the inhomogeneous equation by adding solutions to the homogeneous equation. 
Theorem 3.1.29 can be restated in terms of the solution set to the inhomogeneous matrix equation. 


Theorem 3.1.30 (Superposition—Restated) 
Suppose A is anm x n matrix, b € R”, and z € R” satisfies Az = b. Then the solution set to 
Ax = bis 

S =i =e4+ y|Av=O}. 
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To show Theorems 3.1.29 and 3.1.30 are equivalent, we must show 


Theorem 3.1.29 = > Theorem 3.1.30 
and 
Theorem 3.1.30 => Theorem 3.1.29. 


We leave this proof to Exercise 25. 

What does this have to do with parallel solution sets? The solution set to the homogeneous equation 
can be translated by any vector that solves Ax = b to produce the entire solution set to the equation 
Ax = b. In Exercises 13 and 14, we ask you to draw some examples of solution sets to see how they 
relate to the Principle of Superposition. 

We can also get a solution to the homogeneous equation using two solutions to the inhomogeneous 
equation Ax = b. 


Corollary 3.1.31 
Let A be an m x n matrix and b € R”. Suppose u, v € R” are solutions to Ax = b. Then u — v 
is a solution to Ax = 0. 


We leave the proof of this corollary as Exercise 27. 

Suppose we are given anm x n matrix A and vector b € R”. Using the Principle of Superposition, 
we can make a connection between the number of solutions to Ax = 0 and Ax = b. In fact, every 
solution to Ax = b is linked to at least one solution of Ax = 0. The following corollary states this 
connection. 


Corollary 3.1.32 
Let A be anm x n matrix and b € R” .. Suppose also that v € R” is a solution to Ax = b. Then 
Ax = b has a unique solution if and only if Ax = 0 has a only the trivial solution. 


Proof. Let A be anm x n matrix and b € R”. Let S be the solution set of Ax = b. 
(=) Assume S = {v}. We will show that Ax = 0 has exactly one solution. Suppose, to the contrary, that 
x1, x2 € R” are solutions to Ax = 0 with x; # x2. Then by Theorem 3.1.30, v + x1, v + x2 € S, but 
vu+x, A v+ x2, a contradiction — <. Therefore, the only solution to Ax = 0 is the trivial solution. 
(<=) Now suppose that Ax = 0 has only the trivial solution. Then, by Theorem 3.1.30, S = {u+ y 
Ay = 0} = {v + 0} = {v}. Therefore Ax = b has a unique solution. 


In Corollary 3.1.32, we assume that Ax = b has at least one solution. We need this assumption 
because Ax = 0 always has a solution (the trivial solution) even in the case where Ax = b does not. 
So, the connection between solution sets only exists if Ax = b has a solution. 

Geometrically, we can see the connection in R? in Figure 3.1. Here, we see that the superposition 
principle tells us that we can find solutions to Ax = b through translations from solutions to Ax = 0 
by a vector v that satisfies Av = b. 

Theorem 3.1.29 is very useful in situations similar to the following example. 


Example 3.1.33 Suppose we are given some radiographic data b € R” that we know was obtained 
from some object data u € R”. Suppose, also, that we know that the data came from multiplying a 


106 3 Vector Space Arithmetic and Representations 


Solution 
Solutions to Ax = 0 


Fig. 3.1. The Principle of Superposition in R* showing a translation from the plane describe by Ax = 0 to the plane 
described by Ax = b. 


matrix A by u. That is, we know 
Au = b. 


As is typical in tomographic problems (See Chapter 1), we want to find u so that we know what makes 
up the object that was radiographed. If Ax = b has infinitely many solutions, we may struggle to know 
which solution is the correct solution. Because there are infinitely many solutions, it could take a very 
long time to go through them. 

In situations like this, we may want to apply Theorem 3.1.29 and some a priori (previously known) 
information. Exercise 23 asks you to explore a solution path to this problem. 


xx Watch Your Language! 
Since we have drawn many connections between systems of equations and matrices, we close this section 
with a note about the language we use for solving systems of equations and reducing matrices. 
We say 


v¥ We reduced the augmented matrix to solve the system of equations. 
Vv Wesolved the system of equations with coefficient matrix A. 
Vv Wesolve the matrix equation Ax = b. 


We do not say 


X We solved the augmented matrix. 
X We solved the matrix A. 
3.1.6 Exercises 
1. Write the solution set to the following system as a linear combination of one or more solutions. 
x-y- z=0 


2x —y+3z=0. 
—x —2z=0 
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2. Write the solution set to the following system as a linear combination of one or more solutions. 


2x —- y—3z=0 
4x —2y —6z=0. 
6x — 3y —9z=0 


3. Write the solution set to the following system as a linear combination of one or more solutions. 


x —2y —2z=0 
—x +2y+2z=0. 
3x — 3y — 3z =0 


For Exercises 4 to 8, consider arbitrary vectors u, v, w from a vector space (V, +, -). 


4. Write a linear combination of the vectors u, v, and w. 

5. Is 3u — 4w + 5a linear combination of u and w? Justify your response. 

6. Is 5u — 2v a linear combination of u, v, and w? Justify your response. 

. Show that any linear combination of u, v, and 3u + 2v is a linear combination u and v as well. 

. Is it possible to write a linear combination (with nonzero coefficients) of u, v, and w as a linear 
combination of just two of these vectors? Justify your response. 


own 


In Exercises 9 to 12, determine whether w can be written as a linear combination of u and v. If so, 
write the linear combination. If not, justify. 


= 
II 
Mn We 
SS 
Il 
w | 
N 
fab) 
=) 
a 
S 
II 
ee 


10. wu = 3x7 +4x4+2,v=x—5,andw=x2+2x—-1. 


11. 
1 —1 1 
u={2}],v=|{ 0], andw=]1 
0 3 4 


12. w= 3x7 4+x+42,0 =x? — 2x 43, and w = —x? — 1. 
13. Consider the matrix 2 x 2 matrix M and vector b € R* given by 


12 2 
M= (33) and b = i: 
Show that x = (;) is a solution to Mx = b. Find all solutions to Mx = 0 and plot the solution 


set in R?. Use Theorem 3.1.29 or 3.1.30 to draw all solutions to Mx = b. 
14. Consider the matrix 3 x 3 matrix M and vector b € R* given by 


121 3 
M={211] andb=]{-1 
332 2 
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—3 
Show thatx = | 1 | isasolutionto Mx = b. Find all solutions to Mx = 0 and plot the solution 
4 
set in R3. Use Theorem 3.1.29 or 3.1.30 to draw all solutions to Mx = b. 


For each of Exercises 15 to 17, do the following. (a) Describe, geometrically, the set X of all linear 
combinations of the given vector or vectors, and (b) for the value of n specified, either prove or provide 
a counter example to the statement: “X is a subspace of R”.” 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


1 
n=2u=(1), 


1 1 
n=3,u=|0] andv={]| 3 
2 —1 
1 —1 
n=3,u={-—2] andv=|]| 2 
1 —1 


Solve the system of equations 


x —2y+3z=6 
x+2y+4z=3 
x+y+z=1 


and at each stage, sketch the corresponding planes. 
A street vendor is selling three types of fruit baskets with the following contents. 


Apples Pears Plums 
Basket A| 4 2 1 
Basket B} 3 1 2 
Basket C} 1 2 2 


Carlos would like to purchase some combination of baskets that will give him equal numbers of 
each fruit. Either show why this is not possible, or provide a solution for Carlos. 

Consider the images in Example 3.1.2. Determine whether Image 3 is a linear combination of 
Image 2 and Image B. 

Consider the space of 7-bar LCD characters defined in Example 2.4.17. Define D to be the set 
of 10 digits seen in Figure 3.2, where do is the character that displays a “0”, d; is the character 
that displays a “1”, etc. Write the set of all linear combinations of the vectors dp and dg. Can any 
other elements of D be written as a linear combination of dp and dg? If so, which? Can any other 
vectors of D(Z2) be written as a linear combination of dg and dg? If so, which? 

Consider the given heat states in H4(R). (See Figure 3.3) 

Draw the result of the linear combination 3h; + 2h4, where h is the top left heat state and h4 is 
the second heat state in the right column. 

In Example 3.1.33, we discussed using a priori information alongside Theorem 3.1.29 to 
determine a solution to Ax = b that makes sense for a radiography/tomography problem. Suppose 
that we expect the data for an object to represent density measurements at specific points 
throughout the object. Discuss characteristics for a solution u that likely make sense when 
representing object data. That is, what type of values would you expect to find in the entries 
of u? How can you use solutions to Ax = 0 to find another solution that has values closer to what 
you expect? 


3.1. Linear Combinations 


Fig.3.2 The set D = {do, dj, d2, d3, da, ds, de, d7, dg, do}. (The ten digits of a standard 7-bar LCD display.) 


il a. 
zx=0 hy c=L 
T=1 + 
wha] 
T=) i, saARaRE AANA, 
= 0 P= hb 
al ES 
xr=0 = 
T=1 a 
T = 04+ a ma 
xz=0 hg c=L 
Tol 7 
Pa oy _ 
xz=0 r=L 
T=1 $$ 
e whe " 
xz=0 r=L 


Fig.3.3 Heat states in H4(R). 
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24. Show that every homogeneous system of equations has at least one solution. Why do we call this 
solution the trivial solution? 

25. Prove that Theorem 3.1.29 and Theorem 3.1.30 are equivalent. 

26. Describe how we know that a homogeneous system of equations will never reduce to a system 
of equations where one or more equations are false. 

27. Prove Corollary 3.1.31. 

28. Recall that the phrase “if and only if” is used when two statements are equivalent. Up to this point, 
we have been exploring connections between solutions to systems of linear equations, solutions 
to matrix equations, and linear combinations of vectors. In Section 2.2 and in this chapter, we 
have presented theorems and corollaries making these connections. Combine these ideas” to 
complete the theorem below. 


Theorem 3.1.34 
Let A be ann x n matrix and b € R”. The following statements are equivalent. 


e Ax = 0 has a unique solution. 


Prove Theorem 3.1.34. 


29. State whether each of the following statements is true or false and supply a justification. 


(a) A system of equations with m equations and n variables, where m > n, must have at least one 
solution. 

(b) A system of equations with m equations and n variables, where m <n, must have infinitely 
many solutions. 

(c) Every subspace of R” can be represented by a system of linear equations. 


3.2 Span 


Let us, again, consider the example of radiographic images. Suppose, as we did in Section 3.1, that a 
subspace of radiographs all have a particular property of interest. Because this subspace is not a finite 
set, and due to the limits on data storage, it makes sense to know whether a (potentially small) set of 
radiographs is enough to reproduce, through linear combinations, this (very large) set of radiographs 
holding an important property. If so, we store the smaller set and are able to reproduce any radiograph in 
this subspace by choosing the linear combination we want. We can also determine whether a particular 
radiograph has the property of interest by deciding whether it can be written as a linear combination 
of the elements in the smaller set. 

We considered a similar, yet smaller, example in Section 2.1, using Images A, B, and C (also found 
in Figure 3.4). Our task was to determine whether certain other images could be represented as linear 


? Hint: one bullet should be about the reduced echelon form of A, one should be about a system of equations, one should 
be about linear combinations of the columns of A, and one should be about the matrix equation Ax = b. 
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Image A Image B Image C 


Fig.3.4 Images A, B, and C of Example 3.1.2. 


combinations of these three. Let us define the set S of 4 x 4 grayscale images that can be obtained 
through linear combinations of Images A, B, and C. That is, we define 


S = {I € Z4y4 | J is a linear combination of Images A, B, and C} 
={aA+ BB+ 9C € Tyx4 | a, 8, y € R}. 


In Section 2.1, we saw that Image 3 (See Figure 3.5) is in the set S, but Image 4 is not in S. If Image 
4 is important to our work, we need to know that it is not in the set S. Or we may simply be interested in 
exploring the set S because we know that Images A, B, and C represent an important subset of images. 

In this section we will study the set of all linear combinations of a set of vectors. We will define 
span and clarify the various ways the word span is used in linear algebra. We will discuss this word as 
a noun, a verb, and as an adjective. Indeed, the term span in linear algebra is used in several contexts. 
The following are all accurate and meaningful uses of the word span that we will discuss in this section. 

The set X is the span of vectors u, v, and w. 

The vectors x and y span the subspace W. 

T is a spanning set for a vector space V. 


Image 3 Image 4 


Fig.3.5 Image 3 is in S, but Image 4 is not. 


3.2.1 The Span of a Set of Vectors 


We begin by considering a set, X of vectors. As discussed above, we want to know more about the set 
of all linear combinations of X. We call this set the span of X: 
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Definition 3.2.1 


(n.) Let (V, +, -) be a vector space and let X C V. The span of the set X, denoted Span X, is the 
set of all linear combinations of the elements of X. In addition, we define Span @ := {0}. 


It is important to note that the object, Span X, is a set of vectors. The set X may have finitely 
many or infinitely many elements. If X = {vj, v2,--- , vn}, we may also write Span {v1, v2,--- , Un} 
to indicate Span X. We can express this set as 

Span X = {ay vy + a2v2 +--+ + AnUp | 1, 02,+++ , An € FY. 


Let us now consider some examples. 


Example 3.2.2 In the example above, we defined S to be the set of all linear combinations of Images 
A, B, and C. We can now write 


maid el | . 


Image A Image B Image C 
And, we know that 
Image 3 Image 4 


Definition 3.2.1 includes a separate definition of the span of the empty set. This is a consequence 
of defining span in terms of linear combinations of vectors. One could equivalently define Span X as 
the intersection of all subspaces that contain X as is proved in Theorem 3.2.20. 


Example 3.2.3 Consider the two polynomials x and 1, both in P; (IR). We have Span{x, 1} = {ax + 
b | a,b € R}. In this particular case, the span is equal to the whole space P;(R). 


Example 3.2.4 Now consider the vectors 


—1 1 0 
v=] OO}, w= |]0)], andv3=]0 
1 0 1 


in R3. We can find W = Span{v1, v2, v3} by considering all linear combinations of v1, v2, and v3. 
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Ly Zy 


Fig. 3.6 In Example 3.2.4, the same set (the xz-plane) is represented as the span of two different sets. Left: 
Span{v1, v2, v3}, Right: Span{v2, v3}. 


—1 1 0 
Span{v}, v2, v3} = 7 al 0 7+81/0]4+7]0]] a 6,7eR 
1 0 1 
-a+ 6 
= 0 a,B,yER 
ary 
a 
= 0 a,beR 
b 
1 0 
= Span O},]0 
0 1 
= Span{ve, v3}. 


This example is interesting because it shows two different ways to write the same set as a span. 
Another way to say this is that all linear combinations of vj, v2, and v3 is the same set as all linear 
combinations of the vectors v2 and v3. Now, you likely wonder if there are other ways to write this set 
as a span of other vectors. We leave this exercise to the reader. (See Exercise 8.) 

When considering the span of vectors in R”, as we did in Example 3.2.4, it is helpful to consider 
the geometry of linear combinations. Figure 3.6 shows that the span of v1, v2, and v3 is the same as 
the span of v2 and v3. 


Example 3.2.5 Consider the vectors 


vy = 3x +4, 02 =2x +1, 03 = x7 42, and v4 = x 
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from Example 3.1.5. We found that the vector vj is in Span {v2, v3, v4}. We know this because v; can 
be written as a linear combination of vectors v2, v3, and v4. 


3.2.2 To Span a Set of Vectors 


In the beginning of this section, we discussed the idea of finding a smaller set whose linear combinations 
form a bigger subspace that we find interesting. In the last section, we called the set of all linear 
combinations of a collection of vectors the span of the vectors. In this section, we describe the action 
of creating the span of a set of vectors. 


Definition 3.2.6 


(v.) Let (V, +, -) be a vector space and let X be a (possibly infinite) subset of V. X spans a set W 
if W = Span X. In this case, we also often say that the vectors in X span the set W. 


In general, it is important to distinguish between a set S and the elements of the set. However, in 
Definition 3.2.6 above, notice that we do not make a distinction: we can refer to both a set of vectors 
and the vectors themselves as spanning W. 

We continue by considering a few examples to illustrate this terminology. 


Example 3.2.7 In Example 3.2.2, we discussed 


ee ea | . 


Image A Image B Image C 


In this example, we make clear that Images A, B, and C do not span the set of Z4x4 of 4 x 4 digital 


images because 
Ci oe - CI : = 


Image 4 Image 4 


Therefore, {7mage A, Image B, Image C} does not span Z4x4. 


Example 3.2.8 In Example 3.2.3, we found that {x, 1} spans P) (IR). We can also say that the vectors 
x and | span the set P;(R). 


Example 3.2.9 Let us now consider the solution space V of the system of equations 


3x +4y+2z=0 
xty+z=0 
4x+5y+3z=0. 


3.2 Span 115 


We can write the solution set in different ways. The first just says that V is the solution space of the 
system of equations. 


V= {.y,2" |3x+4y+2z=0,x+y+z=0, and 4x + Sy +32 = 0}. 
Next, we can solve the system and write more clearly a description of the solutions. 
V= |(-2z,2,2)" |z¢€ R| . 
We recognize this as a parametrization of a line through the origin and the point (—2, 1, 1) in R?. Now, 
we can also write V as a span. 


a Span { (-2, 1, pi. 


This means that we can say that (—2, 1, 1)' spans the set V. Or we can say that V is spanned by the 
set {(—2, 1, 1)"}. 


Let us now return to the matrix equation Ax = b. Consider the following example. 


Example 3.2.10 Consider, again, the question posed in Example 3.1.4. We asked whether it was 
2 

possible to write w = | 5 | as a linear combination of 
3 


1 2 
vp = | O),v= 12], andv3= | 1 
1 1 4 


We found that the answer to this question can be found by solving the matrix equation Ax = b, where 


123 2 
A=|021] andb=]5 
114 3 


We found that w could, indeed, be written as a linear combination of v1, v2, and v3. We also found 
that Ax = b had a solution. This led to Theorem 3.1.18. Using the vocabulary in this section, we can 
now say that w is in the span of v1, v2, and v3. 


The result in Example 3.2.10 can be stated for more general situations as the following theorem. 


Theorem 3.2.11 
Let A be an m x n matrix and let b € R”. The matrix equation Ax = b has a solution if and 
only if b is in the span of the columns of A. 


Proof. Let A and b be as defined above. In particular, let 
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where cj, C2,..., Cn € R” are the columns of A. 
(=) Suppose that Ax = b has a solution, 


An 
Then, using the understanding of matrix multiplication from Section 3.1, we can write 


b=Ax 


An 


=ajc] +a2c7 +... + Anh. 


Thus, b € Span{cj, c2,..., Cn}. 
(<=) Now, suppose that b is in the span of the columns of A. That is, there are scalars a1, a2,..., Qn 
so that 
b=aycy tance. +... + Ann. 


Again, using the understanding of matrix-vector products from Section 3.1, we know that 


Q) 
it Vyas 
b=]c,c2...Cy 
| | | 
An 
Qa) 
a2 
Therefore, | . | is asolution to Ax = b. 
An 


We now move to different uses of the word span. 


A Spanning Set 
A set that spans another set can also be described using the word span. Here, we introduce the adjective 
form to describe a set that spans another set. 


Definition 3.2.12 


We say that a (possibly infinite) set S is a spanning set for W if W = Span S. 
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Let us consider how this use of the word can be applied to our previous example. 


Example 3.2.13 In Example 3.2.9, we found that {(—2, 1, 1)"} spans the solution space V. This means 
that {(—2, 1, 1)"} is a spanning set for V. 


In Definition 3.1.1 we made explicit that a linear combination has finitely many terms, but in 
Definition 3.2.12, we allow the spanning set to have infinitely many elements. In the next example, we 
clarify this. 


Example 3.2.14 Let 5 = {1, x, x*, x°, ...}. Since every polynomial is a linear combination of (finitely 
many” ) elements of S, we see that Pa. (IR) © Span S. Also, every element of Span S is a polynomial, 
so Span S C Poo(R). Hence, Span S = Poo (R), and so S$ is a spanning set for Po (R). 


Example 3.2.15 Going back to Example 3.2.3, again, we can now say that {x, 1} is a spanning set for 
Pi(R). 


AS we saw in earlier sections, spanning sets are not unique. Here is another spanning set for P; (R). 


Example 3.2.16 Show that {x + 1, x — 2, 4} spans P;(R). 


1. First show that Span{x + 1, x — 2,4} C P 1 (R) by showing that an arbitrary vector in the span is 
also a polynomial of degree | or less. 
Indeed, we know that if p € Span{x + 1, x — 2, 4} then there exists scalars a, 3 and y such that 


p=aixt+1)+6@-2)+74) 
Now, p = (a+ 2)x + (a — 20 + 44) which is a vector in P; (R). Thus, Span{x + 1, x — 2,4} ¢ 
PCR). 
2. Next show that P;(R) C Span{x + 1, x — 2, 4} by showing that an arbitrary polynomial of degree 
one or less can be expressed as a linear combination of vectors in {x + 1, x — 2, 4}. 


If p € P\ (R), then p = ax +b for some a, b € R. We want to show that p € Span{x + 1, x — 
2,4}. That is, we want to show that there exist a, 3, y € R so that 


p=a(x+1+6 -2)+74). 
If such scalars exist, then as before, we can match up like terms to get the system of equations: 


(x term:) a=a+¢ £, 
(constant term:) b = a — 23 + 44. 


Thus, if a = 7442, @ = 4", andy = 0, then 
p=a(ix+1+6 -2)+74). 


(There are infinitely many solutions. The above solution is the particular solution in which y = 0.) 
So, for any p € P;(R), we can find such scalars. Thus, P; (R) C Span{x + 1, x — 2, 4}. 


3 As we discussed in Example 3.1.3, functions like cos x that can only be written using a Taylor series (that is, using an 
infinite number of the terms of S) are not in Poo (IR); nor are they in Span S. 
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We now have 7; (R) = Span{x + 1, x — 2, 4} and, therefore, {x + 1, x — 2, 4} spans PR). 


In the next example, we show that {x?, 1} is not a spanning set of P2(R). 


Example 3.2.17 Every linear combination of x? and 1 is a polynomial in P2(R), so Span{x?, 1} C 
P2(IR). However, not every polynomial in P2(R) can be written as a linear combination of x? and 1; 
for example, x? + x + 1. Thus, P2(R) ¢ Span{x?, 1}. That is Span{x*, 1} 4 P2(R). 


Example 3.2.18 Recall the vector space D(Z2) of 7-bar LCD characters from Example 2.4.17, where 
white corresponds to the value zero and green corresponds to the value one. With element-wise 
definitions of addition and scalar multiplication as defined for the field Z2, D(Zz) is a vector space. 
Here are two examples of vector addition in D(Z>): 


me 


One can show that the set D = {do, dj, --- , do} of the ten digits shown in Figure 2.17 is a spanning 
set for D(Z2). Thus, every character in D(Z2) can be written as a linear combination of the vectors 
in D. 


Because a spanning set will help us describe vector spaces with fewer vectors, it may be useful to 
find spanning sets for vector spaces of interest. In the next example, we illustrate a possible strategy 
for finding such a set. 


Example 3.2.19 Consider the vector space P; (IR). Suppose we haven’t yet found a spanning set for 
this vector space, but we want one. We know at least the following three things about a spanning set: 


1. The spanning set contains elements of the set it spans. 

2. Every element of the set is a linear combination of finitely many elements of the spanning set. 

3. Aset does not have a unique spanning set. (Many times, there are actually infinitely many spanning 
sets.) 


We can choose an element of ? (IR) to start. Let’s choose 

vy =xtl. 
We know that not every element of P| (R) is a scalar multiple of vj. So, Span{v;} 4 P)(R). To find a 
second vector, we can choose any element of 7; (IR). Let’s choose vz = 2x + 2. Since v2 = 2v1, we 
know that any vector not a scalar multiple of v; (or of v2) is not in Span{v 1, v2}. In order to span the 


whole set, we need to find an element of P; (IR) that is not a scalar multiple of vj. We can choose 


v3 =2x+1. 
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Since there is no scalar a so that v3 = av), we know that v3 ¢ Span{v 1, v2}. Now, it may or may not 
be clear whether we have ?; (R) = Span{v;, v2, v3}. We can keep adding vectors until all vectors of 
P(R) are in the spanning set, but that would not be helpful. After each vector is added, we can check 
whether there are any elements of P| (IR) that are still not in the span of the vectors we have chosen. 
That is, we want to know if for some a, b € R, we can find scalars a, 3, and y so that 


ax +b=av, + Bv2 + yv3. 


That is, we want to know whether or not the system of equations below has a solution for every choice 
of a and b. 

a+20+2y=a 

a+26+7=b 


Reducing this system, we find that there are infinitely many solutions for a, 3, and y. For example, one 
solution is a = 0, 8 = ga + $b and y = }(a — b). This tells us that ax + b = 0-v) + (Za + 4b) - 
v2 + z(a — b) - v3. Inother words, ax + b € Span{v1,2 , v3} foranya, b € R. Since Span{v1, v2, v3} C 
Pi(R) and = P,(R) C Span{vj, v2, v3} (we just showed this part), P)(R)= 
Span{v 1, v2, v3}. Thus, one spanning set of P;(R) is {v1, v2, v3}. 


3.2.3 Span X is a Vector Space 


Because of the close connection between the definitions of span and the closure properties of a vector 
space, it is not surprising that the span of a set is also a vector space. The next theorem formalizes this 
idea. 


Theorem 3.2.19 
Let X be a subset of vector space (V, +, -). Then Span X is a subspace of V. 


Proof. Suppose X is a subset of vector space (V, +, -). We show that 0 € Span X and Span X is closed 
under addition and scalar multiplication. Then, by Theorem 2.5.2, Span X is a subspace of V. 

First, 0 € Span X because either X = 4 and Span % = {0} or 0 is the trivial linear combination (all 
zero coefficients) of any finite collection of vectors in X. 

Let u, v € Span X and a € F. Then wu and v are linear combinations of vectors in X, and hence 
for any scalar a, au + v is also a linear combination of vectors in X. Thus, Span X is closed under 
addition and scalar multiplication and is hence a subspace of V. 


Using Theorem 3.2.19, we can now give an alternate characterization of the span of a set of vectors. 


Theorem 3.2.20 
Let (V, +, -) be a vector space and let X C V be aset of vectors. Then, Span X is the intersection 
of all subspaces of V that contain X. 
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Proof. Let S be the set of all subspaces containing X. That is, if S € S, then S is a subspace of V and 
X CS. We use the notation () S to denote the intersection of all subspaces* in S. 
SeS 
We need to show that Span X C () S and () S C Span X. 


SeS SeS 
First, suppose that v € Span X. Then v is a linear combination of the elements of X. That is, 


m 
v= 2 AkXks 
k=1 


for some scalars a, and for x, € X. Since, for each k = 1, 2, ...,m, xx € S for each S containing X, 
and since § is a subspace of V, we know that S is closed under linear combinations. Thus, v € S. 
Therefore, v € () S. Thus, Span X C () S. 
SeS SES 
Now assume w € () S. Then w € S for every S € S. Notice that 
SeS 


Span X € S. 


Thus, w € Span X. Therefore Span X = () S. 
SeS 


Theorem 3.2.19 might lead you to consider whether or not every vector space can be written as the 
span of a set. The answer is “yes.” Consider a vector space V. Because V is closed under addition and 
scalar multiplication, we know that every linear combination of vectors from V are, again, in V. Thus, 
V = Span V. 

Of course this is not very satisfying; a main motivation for considering span to begin with was the 
desire to have an efficient (that is, small) set of vectors that would generate (via linear combinations) 
a subspace of interest within our vector space. 

We will return to the important question of finding efficient spanning sets in upcoming sections. 

For now, here is a more satisfying example. 


Example 3.2.21 Let V = {(x, y,z) €R? | x +y+z=0,3x + 3y + 3z = 0}. We can write V asa 
span. Notice that V is the solution set of the system of equations 


x+ y+ z=0 
3x + 3y + 3z=0° 


We see that after elimination, we get the system 


x+ty+z=0 
0=0° 


Thus y and z can be chosen to be free variables and we get (x, y, z)' = (—y — z, y, z)!. That is, 


4 This notation allows the possibility that there could be an uncountable number of subspaces in S. 
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V ={(-y-z,y,2)' | y,z ER} 
= {y(—1, 1,0)' + 2-1, 0, 1)'|y,z ER} 
= Span{(—1, 1, 0)', (-1, 0, 1)"}. 


This example demonstrated that the parameterized form of a solution set is very useful in finding a 
spanning set. It also looks forward to the useful idea that any subspace of a vector space can be written 
as the span of some (usually finite) subset of the vector space. We will develop this idea more fully in 
the next two sections. 

Finally, we introduce the following theorem capturing relationships among subspaces that can be 
written as the span of some subset. 


Theorem 3.2.22 
Let X and Y be subsets of a vector space (V, +, -). Then the following statements hold. 


(a) Span(X MY) C (Span X) M (Span Y). 
(b) Span(X U Y) > (Span X) U (Span Y). 
(c) If X C Y, then Span X C Span Y. 

(d) Span(X U Y) = (Span X) + (Span Y). 


Proof. We prove (a) and leave the remainder as exercises. 

Suppose X and Y are subsets of a vector space (V, +, -). First we consider the case where X 1 Y = @. 
We have Span(X 1 Y) = {0}. Note that Span X and Span Y are subspaces (Theorem 3.2.19), and the 
intersection of subspaces is also a subspace (Theorem 2.5.18). As every subspace contains the zero 
vector, Span(X M Y) C (Span X) M (Span Y). 

Next, consider the case X M Y # WJ. Let u be an arbitrary vector in X NY. Thenue X,ueY, 
sou € Span X,u € Span Y, and thus, uw € (Span X) M (Span Y). That is, Span(X NY) C (Span X) NM 
(Span Y). 


Example 3.2.23 Consider the subsets of R?: X = {(1, 0)} and Y = {(0, 1)}. We have X N Y = Gand 
Span(X M Y) = {0}. Also, Span X is the x-axis in R? and Span Y is the y-axis in R?. So, (Span X) N 
(Span Y) = {0} and statement (a) in Theorem 3.2.22 holds. 

Also, X UY = {(1, 0), (0, 1)} and Span(X U Y) = R. But (Span X) U (Span Y) is the set of vectors 
along the x-axis or along the y-axis. So statement (b) in Theorem 3.2.22 holds. 


We close this section with a recap of the use of the word span. 


xx Watch Your Language! Suppose X = {xj,x2,..., Xn} is a subset of vector space (V,+, -) such that 
Span X = W C V.Then the following statements are equivalent in meaning. 


YX spans the set W. 

J X1,X2,...,X, span W. 

Y Xisaspanning set for W. 

Y Wisspanned by X. 

/ The span of x1, x2,...,%n is W. 
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J Span X is a vector space. 


It is mathematically (and/or grammatically) incorrect to say the following. 


X X spans the vectors wy, w2, w3, .... 


x X1,X2,...,X, spans W. 
X x1,x9,...,x isa spanning set for W. 
X The spanning set of V is a vector space. 


3.2.4 Exercises 


4 


nN 


. True or False: A set X is always a subset of Span X. 


. True or False: Span X is always a subset of X. 
. In Example 3.2.4, why can we say that 
-—a+p a 
0 a,B,yERe= O}) a,beR} ? 
aty b 


. In Exercise 2 of Section 2.3, you found that some of the sets were subspaces. For each that was a 
subspace, write it as a span. 

. Use an example to show that the statement P2(R) ¢ Span{x*, 1} (in Example 3.2.17) is true. 

. Decide if the given vector lies in the span of the given set. If it does, find a linear combination that 
makes the vector. If it does not, show that no linear combination exists. 


a 1 0 
(a) {O},% 1/0], ] 0] $ inR?’. 
1 0 1 


(b) x — x3, Es 2x +x2,x + x3}, in P3(R). 


© (2){(09) G8) }msoam 


. Determine if the given set spans R?. 
1 0 0 
(a) O7,{/2].]0 
0 0 3 
2 1 0 
"1-0-6 
| 1 0 1 
1 3 
“16. 
0 0 
1 3 -1 2 
“LO. 0G)6 
| 1 0 0 5 
2 3 5 6 
(e) 1],;O)7,;1],]0 
(0.00 
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8. 


9. 


10. 


11. 


12. 


13. 
14. 


15. 
16. 


17. 


18. 


19. 
20. 


21. 


22. 


23. 


Is it possible to write the span given in Example 3.2.4 as a span of other vectors? If so, give another 
example. If not, justify. 
Find a spanning set for the given subspace. 


(a) The xz-plane in R. 


XxX 
(b) y || 3x +2y+z=0} inR’. 
z 
x 
(c) : 2x+y+w=Oand y+2z=0} inRt. 
Ww 


(d) {ap tayx + anx? + a3x>| ag + ay = 0 and az — a3 = 0} in P3(R) 
(e) The set P4(R) in the space P4(R). 
(f) Mox2(R) in M2x2(R). 


Briefly explain why the incorrect statements in the “Watch Your Language!” box on page 121 are 
indeed incorrect. 
Let u, v be vectors in V. 


(a) Show that Span{u} = {au € V |a € R}. 
(b) Prove that Span{u, v} = Span{u} if and only if v = au for some scalar a. 


Determine whether or not {1, x, x7} is a spanning set for P) (IR). Justify your answer using the 
definitions in this section. 

Complete the proof of Theorem 3.2.22. 

Show, with justification, that Span(X U Y) = (Span X) U (Span Y) (see Theorem 3.2.22) is, in 
general, false. 

Show that (Span X) U (Span Y) is not necessarily a subspace. 

Show, with justification, that Span(X MN Y) = (Span X)M (Span Y) (see Theorem 3.2.22) is, in 
general, false. 

Suppose X is a subset of vector space (V, +, -). Let X© denote the complement set of X (that 
is, the set of vectors that are not in X). Compose and prove relationships between Span X and 
Span X© in the spirit of Theorem 3.2.22. 

In Exercise 9 Section 3.1 you found that there were infinitely many solutions to the system. Is any 
equation in the span of the other two? 

In Exercise 12 Section 3.1 is any equation in the span of the other two? 

Use Exercises 18 and 19 above to make a similar statement about the rows of the coefficient matrix 
corresponding to a system of equations. 

Show (using the allowed operations) that any equation, formed in the elimination process for a 
system of equations, is in the span of the original equations. 

Find two different spanning sets (having different number of elements than each other) for each 
of the following vector spaces. 


(a) P2(R). 

(b) M2 x2(R). 

Opinion: Which spanning set in each of the above is likely to be more useful? 

Consider the space of 7-bar LCD images, D(Z2) as defined in Example 2.4.17. Let D be the set 
of digits of D(Z2). 


(a) Sketch the zero vector. 
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(b) Find the additive inverse element of each of the vectors in D. 
(c) How many vectors are in Span{d;}? 
(d) How many vectors are in Span{d2, d3}? 


24. Consider, again, the space of 7-bar LCD images, D(Z2) as defined in Example 2.4.17. Let D be 
the set of digits of D(Z2). 


(a) Let d; be defined as in Exercise 21. Sketch every element of Span{dpo, do}. 

(b) Find one element of D which is in the span of the other elements of D. 

(c) Discuss how you would go about showing that the set of digit images D is a spanning set for 
D(Z2). 


25. Consider the 4 x 4 images of Example 3.1.2. Which of Images 1, 2, 3, and 4 are in the span of 
Images A, B, and C? 
26. Consider the finite vector space V, defined in Section 2.4 Exercise 21, defined by 


Define ©, vector addition, and ©, scalar multiplication, according to the following tables: 


@lla|bicld 
allalblcld ©llalb|c|d 
bilbja|d\c Ollalajaja 
c||[c|dja|b 1 }la|b\c|d 
d\\d\c|bla 


(a) For each vector space property, state a feature (or features) in the tables above that tells you 
the property holds. 
(b) Is any one of the vectors of V in the span of the others? Justify. 


27. Let H be the set of heat states sampled in 4 places along the rod (m = 4). Find a spanning set for 
H. 

28. How many different brain images, u;,u2,---,uz, do you think might be needed so that 
Span{u1, u2,--- , ug} includes all possible brain images of interest to a physician? 


3.3 Linear Dependence and Independence 


The concepts of span and spanning sets are powerful tools for describing subspaces. We have seen 
that even a few vectors may contain all the relevant information for describing an infinite collection of 
vectors. 

Recall, we proved in Section 3.2 that the span of a set of vectors is a vector space. This leads one 
to consider the question, “Can every vector space be expressed as Span X for some set of vectors X?” 
The quick answer is “Yes.” Indeed, if V is a vector space, V = Span V. But, that is not an impressive 
answer. As stated above, we have seen that significantly fewer vectors can be used to describe the 
vector space. So, a better question is “Can every vector space be expressed as a span of a set containing 
significantly fewer vectors?” The intention is that if we have a vector space V, we hope to find X so 
that V = Span X is a much simpler way of expressing V. 
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Consider the problem in which an object is described by intensity values on a 4 x 4 rectangular 
grid. That is, objects can be expressed as images in Z4,.4. Suppose that an image of interest must be a 
linear combination of the seven images introduced on page | 1. That is, we must choose a solution in 
the set 


ee es tet) Lebel . 


Image A Image B Image C Image 1 Image 2 Image 3 Image 4 


The natural question arises: Is this the simplest description of our set of interest? We discovered 
that Images 1, 2, and 3 could all be written as linear combinations of Images A, B, and C. So, Images 
1, 2, and 3, in some sense, do not add any additional information to the set. In fact, we now understand 
that the exact same set can be described as 


ee eta), s : 


Image A Image B Image C Image 4 


Is it possible to reduce the description of W further? This is, in general, the key question that we 
address in this section; the answer in this specific situation is Problem 2 in the exercises. 

Consider also the example of 7-bar LCD characters. We know that the set of 10 digit-images is a 
spanning set for the vector space D(Z2) (see Exercise 23 of Section 3.2). That is, 


In other words, any possible character can be written as a linear combination of these ten characters. 
The question arises: Is this the smallest possible set of characters for which this is true? Can we describe 
D(Zz2) with a smaller set? For example, is it true that 


If not, does a smaller spanning set exist or is the 10 digit-images set the smallest possible spanning 
set? What is the minimum number of vectors necessary to form a spanning set of a vector space? These 
are important questions that we are now poised to explore and answer. 
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3.3.1 Linear Dependence and Independence 


We have seen that a key component to understanding these questions is whether or not some vectors 
can be written as linear combinations of others. If so, then our spanning sets are “too large” in the sense 
that they contain redundant information. Sets that are too large are said to be linearly dependent. Sets 
that are not too large are said to be linearly independent. 


Definition 3.3.1 


Let (V, +, -) be a vector space over F and W C V. We say that W is linearly dependent if some 
vector in W is in the span of the remaining vectors of W. Any set that is not linearly dependent is 
said to be linearly independent. 


In the preceding definition, we often consider finite subsets W of the vector space V, so W = 
{v1, V2, +++ , Un}. Then W is linearly dependent if for some k between | andn, vy € Span {vj, v2,--- , 
Uk—15 Uk+1,°°" 5 Un}. 

Putting the definition in the context of the 4 x 4 image setting, we see the following: 


Example 3.3.2 Consider the 4 x 4 image example from the beginning of the section. We can say that 
the set of seven images is linearly dependent because, for example, 


hte . ee . 


Image 2 Image A Image B Image C. Image 1 Image 3 Image 4 


We know this is true because Image 2 can be written as a linear combination of the other images: 


os 


Image A Image C 


In this case, we do not lose any vectors from the span if we remove Image 2: the remaining 6 images 
span the same subspace of images. We could then test the remaining vectors (images) to see if this set 
is still linearly dependent, if it is, we can remove one of the remaining 6 vectors and still span the same 
subspace of images. Once we end up with a linearly independent set we have an efficient spanning set: 
removing any additional vector will change the spanned subspace. 

If the set W = {v1, v2,--- , vn} is linearly dependent, the choice of vectors that could be removed 
is, in general, not unique. (See Exercise 7.) (The exception is the zero vector.) 


Example 3.3.3 Suppose W = {(5) ; (3) : (‘) : (3)} C R?. W is linearly dependent because 


(5) =4 (3) —3 @: That is, the first vector in W is a linear combination of the remaining vectors 
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San 4 1 0 2 ae 1 0 2 
PPT S) ay a) A) fe ey a ay 


Example 3.3.4 Suppose W = {x, x? —x,x7+ x} C P2(R). W is linearly dependent because (x? + 
x) = (x? — x) + 2(x). We have 


in W. We have 


2 


Span ee =a +x = Span Ye — Xx, x| : 


Example 3.3.5 Suppose W = {(6) ; (1) C R?. Neither vector in W can be written as a linear 


combination of the other. That is fo) ¢ Span (1) and vice versa. Since W is not linearly dependent, 


W is linearly independent. 


Let’s also revisit our LCD example. 


Example 3.3.6 Consider the vector space D(Z2). We can say that the set of ten LCD character images 
is linearly dependent because, for example, 


Example 3.3.7 In any vector space, any set containing the zero vector is linearly dependent. Consider 
two cases. First suppose W = {0, v1, v2,--- , Un}. Clearly, 0 is in Span {v1, v2, --- , v,} since 0 is the 
trivial linear combination of vectors given by 


0=0-v,+0-v2+...+0- uy. 


Next, suppose W = {0}. Since, by definition Span 4 = {0}, we know that {0} is linearly dependent. In 
both cases, we conclude that W is linearly dependent. 
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3.3.2 Determining Linear (In)dependence 


Determining whether a set is linearly dependent or independent might seem like a tedious process 
in which we must test enumerable linear combinations in hopes of finding one that indicates linear 
dependence. Our examples thus far have been with very small sets of a few vectors. Fortunately, we 
can develop a general test for general sets with a finite number of vectors. 

Let W = {v1, v2,--- , Un} be asubset of a vector space (V, +, -). Suppose, for the sake of argument, 
that vy = aov2 + a303 +---+ay,v, for some scalars. Then 0 = —vy + aov2 + 4303 +--+ + ayUp. 
And, multiplying by some arbitrary nonzero scalar yields 0 = ayv; + a2v2 +---+ Qy,Up, where ay 
is guaranteed to be nonzero. Notice that testing each vector in W leads to this same equation. So, one 
test for linear dependence is the existence of some nonzero scalars which make the equation true. 


Theorem 3.3.8 
Let W = {v1, v2,--- , v,} be a subset of vector space (V, +, -). W is linearly dependent if and 
only if there exist scalars a1, @2,-+-+ , Qn, not all zero, such that 


ayvy tagv2 +-:-+ anv, = 0. (3.9) 


Proof. (=) Suppose W is linearly dependent. Then some vector in W can be written as a linear 
combination of the other vectors in W. Without loss of generality’, suppose vj = a2v2 + a3v3 + 
+++ + dyjUn. We have 0 = (—1)v1 + a2v2 + 433 +--+ + ayUp So that the linear dependence relation 
holds for scalars not all zero. 

(<=) Suppose aj; v; + Q2v2 +---+ Q,Up, = O for scalars not all zero. Without loss of generality, 
suppose a; # 0. Then, v; = (—a2/a,)v2 +--+ + (—a@,/a1)vy. Since v; is a linear combination of 
the vectors v2, v3, --- , Un, W is linearly dependent. 


Definition 3.3.9 


Equation 3.9 is called the linear dependence relation. 


Corollary 3.3.10 

Let X = {v1, v2, +--+ , Un} be a subset of the vector space (V, +, -). X is linearly independent if 
and only if the linear dependence relation ajv1 + a2v2 +---+ QnU_, = 0 has only the trivial 
solution aj = a2 =--- =a, = 0. 


5 We say “without loss of generality” here because although it seems like we singled out the specific vector vy. If in fact 
it was a different vector that could be written as a linear combination of the others we could change the labeling so that 
the new v, is the vector that is written as a linear combination. This technique is useful because it allows us to simplify 
notation. We’ll use it frequently throughout the text, so even if it seems a little mysterious at first, you’ll see plenty of 
examples in the coming sections. 


3.3 Linear Dependence and Independence 129 


The linear dependence relation is always true if a} = a2 =... = Gy, = 0, but this tells us nothing 
about the linear dependence of the set. To determine linear independence, one must determine whether 
or not the linear dependence relation is true only when all the scalars are zero. 


Lemma 3.3.11 
Let (V,+, -) be a vector space. The set {v;, v2} C V is linearly independent if and only if v; 
and v2 are nonzero and vy is not a scalar multiple of v2. 


Proof. Let V be a vector space. 
(=>) First, we will show that if {v,, v2} is linearly independent then v1 ¢ kv for any scalar k. Suppose, 
to the contrary, that {v;, v2} is linearly independent and that vj = kv2 for some scalar k. Then v1 — 
kv2 = 0. So, by the definition of linear dependence, since the coefficient of v; is nonzero, {v1, v2} is 
linearly dependent, contradicting our original assumption about linear dependence. — < Therefore, 
if {v1, v2} is linearly independent, v; is not a scalar multiple of v2. 

(<=) Now, we show that if v; is not a scalar multiple of v2 then {v1, v2} is linearly independent. We 
show this by proving the contrapositive instead. Assume that {v1, v2} is linearly dependent. Then there 
are scalars a1, a2 not both zero so that a,v1 + @2v2 = 0. Suppose, without loss of generality, that 
ay #0. Then v; = ae Thus, v; is a scalar multiple of v2. Therefore, if vj is not a scalar multiple 
of v2 then {v1, v2} is linearly independent. 


Example 3.3.12 Let us determine whether {x + 1, x? + 1, x7 +x + 1} C Po(R) islinearly dependent 
or independent. We start by setting up the linear dependence relation. We let 


a(x +1) + 8074 )4+ 707? +x4+) =0. 


Now, we want to decide whether or not a, 3, and y must all be zero. Matching up like terms in the 
linear dependence relation leads to the system of equations 


(x? term:) 0= B+ 
(x term:) 0O=a ie ae 
(constant term:)0 =a+(6+4+7 


Using elimination, we get 
0= B+y7 0O= 6B+y7 
0O=a a ee O=a +7. 
O=at+B+y "8 0=a 


The only solution to this system of equations is ~ = 0, 3 = 0, y = 0. This means that {x + 1, x7 + 
1, x? +.x + 1} is linearly independent. 


Example 3.3.13 Now, let us determine the linear dependence of the set 


U(r) 4) Go) 
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Again, we begin by setting up the linear dependence relation. Let 


00 13 11 12 
(0) =e(11) +90) +900) 

We want to find a, 3, and ¥ so that this is true. Matching up entries, we get the following system of 
equations. 

(1, lentry:)0= a+6+4 y¥ 

(CA, 2) entry:) 0 = 3a + 64 2y 

((2, 1) entry:)0= a+64+ vy 

((2, 2) entry:)0= a-—-@ 


We again, use elimination, but this time, let us use a coefficient matrix and reduce it. 


1 11/0 1 1 410 
3. 12/0 Ro=—3ri+r2 0 —2 —1]0 
1 11/0 R3=—-1r1+1r3,Ra=—r)+r4 0 0 O;0 
1 —1 0/0 0 —2 —1]0 
11 1/0 103|0 

Ro==372 1015/0] R=-r+n [01 5/0 

=> 
Ry=—ryt+r4 | 0 0 0J0 00 0/0 
00 0/0 00 0/0 
Thus, yy can be any real number and a = — x7 and 6 = — 57: Thus there are infinitely many possible 


choices for a, 3, and y. Thus, 
13 1 1 12 
11/’?\1-1/’°\10 


is a linearly dependent set. Indeed, we found that (choosing y = 2) 
Oty fF Py. spd il 42 12 
00) — 11 1-1 10)" 


The linear (in)dependence of the set of vectors given by the columns of an n x n matrix M helps 
to determine the nature of the solutions to the matrix equation Mx = b. 


Theorem 3.3.14 


Let M beann x n matrix with columns v1, v2,..., Uv, € R”. The matrix equation Mx = bhasa 
unique solution x € R” for every b € R” if, and only if, {vj, v2, ..., vn} isa linearly independent 
set. 

Proof. (=) Suppose M is ann x n matrix with columns v1, v2, ..., Uy, and the matrix equation Mx = 


b has a unique solution x € R” for every be R”. That is, b is uniquely written as 
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b= Mw = wiv, + W202 +... + Wynn. Now, suppose, by way of contradiction, that {v), v2,..., Un} 
is linearly dependent. Then there exist scalars yj, y2,..., yn, not all zero, such that 
yyvy + yov2 +... + nV, =O. Then w+y Aw also solves the matrix equation Mx =b, a 
contradiction. Thus, {v1, v2, ..., Un} is linearly independent. 

(<=) Suppose M is ann x n matrix with columns vj, v2,..., v, that form a linearly independent 
set. Suppose, by way of contradiction, that there exists b € R” such that Mx = b and My = b for 
somex ¢ y. Then M(x — y) = 0, orequivalently, (x1 — yj)v; +...+ (in — Yn) Un = 0.Sincex # y, 
Xk — ye #0 for some k, then v1, v2, ..., Uy, is linearly dependent, a contradiction. Thus, Mx = b has 
a unique solution x € R” for every b € R”. 


Linear dependence of a set can also be determined by considering spans of proper subsets of the 
set. 


Theorem 3.3.15 
Let X be a subset of the vector space (V, +, -). X is linearly dependent if and only if there exists 
a proper subset U of X such that Span U = Span X. 


The proof is the subject of Exercise 15. 


Example 3.3.16 Consider X = {(6) ‘ (5) ; (‘) : (3) C R?. We notice that Span X = IR? and 


v= {(6) , (1) X with Span U = R2. Thus, X is linearly dependent. 
Example 3.3.17 Consider 


1 
= ee fe | Bill 


Image A Image B Image C Image 1 Image 2 Image 3 Image 4 


We have seen that 


_ a tas 


Image A Image B Image C Image 4 


is such that Span X = Span Y. Thus X is linearly dependent. 
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Building off of Theorem 3.3.15, it seems that if X is a linearly independent set and X C Y, but 
Span Y # Span X, then there must be a vector v in Y so that X U {v} is also linearly independent. 
Indeed, the next theorem gives a more precise result. 


Theorem 3.3.18 
Let X be a linearly independent subset of the vector space (V, +, -) and let v € V be a vector so 
that v ¢ Span X. Then X U {v} is linearly independent. 


Proof. Let (V,+,-) be a vector space, X = {x1,x2,...,%n} C V be a linearly independent set of 
vectors, and let v € V bea vector not in the span of X. Now, suppose there are scalars a1, @2,..., Qn+1 
so that 

OX, + 2x2 +... + An Xn + On41v = 0. 


Notice that a,+1 = 0 for otherwise 


a] a2 An 


v= x] X2—... Xn € Span X. 
Qn+1 An+1 Ant+1 
Therefore, 
ayxy tarx2 +... + QynXy, = 0. 
Since X is linearly independent, aj = a2 =... = @, = 0. Therefore, {x1, x2, ..., X,, v} is a linearly 


independent set. 


Example 3.3.19 Notice that X = {x? + 2, 2} C P» isa linearly independent set because the vectors 
are not multiples of one another. Also, we can easily see that x* + 3x — 2 ¢ Span X. So, we have 
found an even larger linearly independent set, X U {x* + 3x — 2} = {x2 + 2,2, x? + 3x —2} C Pp. 


3.3.3 Summary of Linear Dependence 


As we have seen with all other concepts in this text, there is appropriate and inappropriate language 
surrounding the idea of linear (in)dependence. Below, we share the appropriate way to discuss the 
linear dependence of vectors and sets of vectors. 


xx Watch Your Language! Linear dependence is a property of a set of vectors, not a property of a vector. For 
a linearly dependent set W = {v1, v2,--- , un}, we have the following grammatically and mathematically 
correct statements: 


/ Wis linearly (in)dependent. 
/ Wisa linearly (in)dependent set. 


J {v1, v2, -++ , Un} is linearly (in)dependent. 
/ The vectors V1, U2,+°* , Up, are linearly (in)dependent. 
/ The vectors V1, U2,+-* , Un forma linearly (in)\dependent set. 


/ The columns (or rows) of a matrix, M, forma linearly (in)dependent set. 
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But it would be incorrect to say 


X Whas linearly (in)dependent vectors. 


x tw 


1,02,-++ , Un} are linearly (in)dependent . 


X The matrix, M, is linearly (in)dependent. 


In general, to show that a set is linearly dependent, you need to exhibit only one linear dependency 
(i.e., you need only write one vector as a linear combination of the others, or equivalently, find one set 
of nonzero constants for which the linear dependence relation holds). 


Tos 


how a set is linearly independent, on the other hand, requires that you show that no nontrivial 


solutions to the linear dependence relation exist; that is, you must show that it is not possible to write 
any of the vectors in the set as a linear combination of the others. 

We summarize a few of the observations from the examples in this section that can help with the 
process of categorizing whether a set in an arbitrary vector space is linearly dependent or independent. 


> Determining if a set W is linearly dependent or linearly independent 


BRWN Fe 


3.3.4 


. If W contains the zero vector, then W is linearly dependent. 

. If W contains a single vector and it is not zero, then W is linearly independent. 

. The empty set is linearly independent. (See Exercise 16.) 

. If W contains two vectors, and neither is zero, then W is linearly dependent if and only if one 


vector is a scalar multiple of the other. 


. If any element of W is a scalar multiple of another element of W, then W is linearly dependent. 
. Ifany element of W is a linear combination of other elements of W, then W is linearly dependent. 
. If W = {w1,... wx} C R”, then we can row reduce the augmented matrix [w, ... wx | 0]. If 


the corresponding system has only the trivial solution, then W is linearly independent. On the 
other hand, if the system has infinitely many solutions then W is linearly dependent. Moreover, 
the process of row reducing a matrix does not change linear dependencies between columns, 
so one can often easily identify linear dependencies from the row reduced matrix. 


. If W = {v1, v2, +++ , Un} then W is linearly independent if 


ayvy +agv2 +--+ + Ay_Up = 0 


has only the trivial solution 


ay =a. =---=a, = 0. 


Exercises 


1. Use the linear dependence relation to determine whether the given set is linearly independent or 
linearly dependent. 


(a) 


(b) 
(c) 
(d) 
(e) 


1 0 -1 0 
1}),/-1),] 2 ],Jo)$cR 
-1 -1 1 1 


{1, x, x7} © Po(R). 

{1,x + x2, x7} C Po(R) 
{1,l—x,1+x,1+x7} C Po(R). 
{l+x,1—x,x} Cc Po(R). 
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10. 


11. 
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11 0 1 —-12 
(f) {(; 0) , (° a , ( I | © M2x2(R) 
(g) {sinx, sin2x, sin3x} C F(R), where F(R) is the vector space of functions on [0, z]. 
(h) 
int ! Vt | 


C D(Zz2) (the vector space of 7-bar LCD images). 


. Given the vector space W given by 


as a | 


Image A Image B Image C Image 4 


Can you write W as a span of fewer vectors? Justify. 


. Let (V, +, -) be a vector space with scalar field IR. Suppose the set of vectors {v1, v2, v3, va} is 


linearly independent. Determine if the following sets are linearly independent. Justify your answer. 
If not, remove as few vectors as you can so the resulting set is linearly independent. 


(a) {v1, v2}. 

(b) {v1, v2, v3, U4, v1 — 203}. 

(c) {vy + 03, Vo + v4, U3, v4}. 

(d) {vj — 2v2, v2, v3 — v4 — V2, U4}. 


. True or False: Suppose {v1,..., vg} is a linearly dependent set of vectors in a vector space V, 


so that vj € Span{v2,..., vz}. Then there must exist i € {2,...,k} so that v; € Span{vj,..., 
Vj-1, Vit1, Ue}. 


. Given the linearly independent set of vectors S = {u, v, w}, show that the set of vectors T = 


{u + 2v,u — w,v + w} is linearly independent and that Span S = Span T. 


. Given a linearly independent set S, use Exercise 5 to make a general statement about how to obtain 


a different linearly independent set of vectors T with Span S = Span T. Be careful to use accurate 
linear algebra language. 


. Show that X = {x +2,x —2,3x?,2x —1, 1} is a linearly dependent set. Now, find two different 


linearly independent subsets of X containing only three vectors. 


. Does the vector space {0} have a linearly independent subset? 
. Find a set of four distinct vectors in R? that is linearly dependent and for which no subset of three 


vectors is linearly independent. 

Given a homogeneous system of three linear equations with three variables, show that the system 
has only the trivial solution whenever the corresponding coefficient matrix has linearly independent 
rows. 

Consider the given heat states in H4(R) (Figure 3.7 in Section 3.4). Find a linearly independent set 
of four heat states. 


3.3 


12. 
13. 
14. 


15. 
16. 


17. 
18. 
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Let Z4(R) be the vector space of 4 x 4 grayscale images. Show that Span Y ~¢ Z4(R) inExample 3.3.17. 


Find four elements of D which form a linearly independent set in D(Z2). See Example 2.4.17. 
Determine whether S = {J), Jz, 13}, where the [, are given below, is a linearly independent set in 
the vector space of images with the given geometry. 


oy, : 7, 
Se? Wie fe 
oy, | 7, 
oy, = 7, 
Weer SAU Ve 
i) | ) 


Prove Theorem 3.3.15. 
Use Theorem 3.3.15 to determine if the given set is linearly dependent or linearly independent. 


(a) W=9. 

(b) W = {0}. 

(c) W={14+x,1-—.x,x} C Po(R). 
(d) 


Consider the 4 x 4 images of Example 3.3.17. Determine a linearly dependent set distinct from Y. 
Consider the finite vector space, given in Section 2.4 Exercise 21, by 


v= fe = oO = a>}. 


Define ©, vector addition, and ©, scalar multiplication, according to the following tables: 


@l|lalbic|d 
allalblc|d ©llalb|c|d 
bilbjajd\c Ollalajaja 
c\|c|dlalb I |la|blc|d 
d\|d\c\bla 


(a) List, if possible, one subset of V with two or more elements that is linearly independent. 
Justify. 
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(b) List, if possible, a set with three or fewer elements that is linearly dependent. Justify. 


19. In Section 3.1, Exercise 28, we began combining equivalent statements. Continue combining 
equivalent statements by completing the theorem below. (If you have already completed that 
exercise, you need only add one more statement from this section, about linear independence. 
In particular, what vectors are related to the matrix A, and what can you say about their linear 
independence/dependence?) 


Theorem 3.3.20 
Let A be ann x n matrix and b € R”. The following statements are equivalent. 


e Ax = O has a unique solution. 


Prove Theorem 3.3.20. 


3.4 Basis and Dimension 


We have now seen many examples of vector spaces and subspaces that can be described as the span of 
some smaller set of vectors. We have also seen that there is a lot of freedom in the choice of a spanning 
set. Moreover, two sets of very different sizes can span the same vector space. This suggests that larger 
sets contain redundant information in the form of “extra” vectors. 

If a spanning set does contain “extra” vectors, the set is linearly dependent. We seek an efficient 
description of a vector space in terms of some spanning set. We see that a most efficient set must have 
two properties. First, it should be able to generate the vector space of interest (be a spanning set). 
Second, it should contain no unnecessary vectors (be linearly independent). Any set with these two 
properties is a maximally efficient spanning set for the vector space. 

Consider, again, the problem in which an object is described by radiographic intensity values on 
a 4 x 4 grid. In Section 3.3, we found two spanning sets for the set of all objects that are linear 
combinations of the seven images from page 11. One was considered a “better” spanning set because 
it did not hold redundant information. In this section, we will explore sets that hold all the information 
we need to recreate a vector space and not more information than we need. We will also use these sets 
to define the dimension of a vector space. 


3.4.1 Efficient Heat State Descriptions 


Consider the set of six heat states X = {h1, h2, h3, ha, hs, he} shown in Figure 3.7 form = 4. We are 
interested in two questions. First, can this set of heat states be used to describe (via linear combinations) 
all vectors in the vector space (H4(R), +, -) of heat states sampled in 6 places along a rod? Second, 
if so, is this the most efficient spanning set? In answering these two questions we will use techniques 
from Sections 3.2 and 3.3. 
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Tai = aes 
T=1 + 
P20) eee SS ee 
r=0 c= 
T=1 ep ery Teer Sd 
ro hg at 
re | 
r=0 c=L 
a $ ° 
| ee 4 + 
x=0 hg 2= 1 
P=] ee oe 
ahs 
T =0 be 4 -| 
r=0 c= L 
T=1 — ——- 
T=0= a a 
r=0 g=L 


Fig.3.7 The set X = {h1, ho, h3, ha, hs, ho} of six heat states in H4(R). 


Can this set of heat states be used to describe all vectors in the vector space H,(R)? In other 
words, is Span X = H4(R)? First note that X is a subset of H4(R). Thus, Span X C H4(R). We need 
only show that H4(IR) C Span X by showing that an arbitrary vector v in H4(R) is in Span X, i-e., it 
can be written as a linear combination of vectors in X. 

For ease, we write heat states as (m+ 2)-tuples of scalars (with first and last entries 0). For 
example, in our set X, h; = (0,1,0,0,1,0) and hz = (0,0,0, 1, 0,0). Consider arbitrary heat 
state v = (0,a,b, c,d, 0) € H4(R), where a, b,c,d € R. We need to show that there exist scalars 
a1,°-- ,@6 such that 

v=ayh, + agh2 +--+: + a6he. 


Substituting for v, h;,..., h¢ above yields 
(0, a, b,c, d,0) = (0, ay + a6, 03 + a4, a2 +03 +05 +06, A] +03 +04 + a5 + 06, 0) 


with the equivalent system of equations 
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Q{ +a6 =a 
a3 +a4 =b 
a2 +23 tas tag =c° 


ay +a3 +04 +a5 +a = d 


We can also write this system as the matrix equation 


Qa) 
100001 a2 a 
001100 a3] |b 
011011 as} fc]? 
101111 a5 d 
6 
with the equivalent augmented matrix 
100001\a 
001100\|b 
O11011/¢ 
101111\d 
The reduced row echelon form of this matrix is 
100 001 a 
010-101] a+c-d 
001 100 b 


000 0 10|-a—b+d 


Thus, we see that the system is consistent and has solutions for all values of a, b, c, and d. We have 
shown that H4(R) C Span X. Together with our previous findings, we conclude that X is indeed a 
spanning set for H4(R). 


Is X the most efficient spanning set? In other words, do we need all six heat states in order to form 
a spanning set? Could we find a smaller subset of X that still forms a spanning set? If we examine 
the reduced row echelon form of the augmented matrix, we can write the solution set (in terms of free 
variables a4 and a6): 


a; =a-— 1% 
ag =a+c—d+a4—a6 
a3=b—-—ag4 

as =—a—b+d 
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The solution set in parametric form is 


Qa a 0 -1 
(e%) at+c—d 1 -1 
a3] _ b -1 0 
lie 0 tag] , | +a6] 4 a4, a6 €R 
a5 —a—b+d 0 0 
6 0 0 1 


We see that we can write vector v = (0, a, b, c,d, 0) by choosing any a4, a6 € R. In particular, 
we can choose a4 = a6 = 0 so that (arbitrary) v can be written in terms of 11, h2, h3, and hs, with 
coefficients a1, @2, a3, and as. That is Y = {h1, h2, h3, hs} is amore efficient spanning set for H4(R) 
than X. 

We also see that no proper subset of Y can span H4(R) because for arbitrary constants a, b, c, and 
d, we need all four coefficients a1, a2, a3, and a5. This means that Y is, in some sense, a smallest 
spanning set. Another way to think about this is to recall that X is larger because it must contain 
redundant information. And Y, because it cannot be smaller, must not contain redundant information. 
Thus, we suspect that X is linearly dependent and Y is linearly independent. We next check these 
assertions. 


Example 3.4.1 Show that X is linearly dependent. Notice that hy + h2—hg =0. Thus, by 
Theorem 3.3.8, X is linearly dependent. 


Example 3.4.2 Show that Y is linearly independent. From the solution set above, we see that the zero 
heat state can only be written as the trivial linear combination of the heat states in Y. That is, for 
a=b=c=d = Owe have the unique solution aj = a2 = a3 = as = 0. Thus, by Corollary 3.3.10, 
Y is linearly independent. 


The results of the examples above indicate that there is no subset of X consisting of fewer than 4 
heat states in H4(R) that spans H4(R). In fact there is no spanning set of H4(R) containing fewer than 
4 heat states (see Exercise 15). 

The relationship between minimal spanning sets and linear independence is explored in the next 
sections. 


3.4.2. Basis 


Linearly independent spanning sets for vector spaces, like the set of heat states Y = {h1, h2, h3, hs} Cc 
H4(R) that we found in the previous section, play an important role in linear algebra. We call such a 
set a basis. 


Definition 3.4.3 


A subset 6 of a vector space (V,+, -) is called a basis of V if Span 6 = V and B is linearly 
independent. 


The first condition in the definition of a basis gives us that 6 is big enough to describe all of V (B 
is a spanning set for V) and the second condition says that 6 is not so big that it contains redundant 
information. 
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Example 3.4.4 The set Y = {h1, ho, h3, hs} is a basis for H4(R). 


It turns out that bases are not unique. Can you think of a different basis for H4(R) other than Y? 
We now consider bases for Euclidean spaces. 


Example 3.4.5 A basis for R? is 


1 0 0 
S= O7V.,, 1], 
0 0 1 


First, we show that Span S = R?. Notice that Span S C R? because arbitrary linear combinations of 
vectors in S are also vectors in R?. Next consider an arbitrary vector in R?, 


a 
v=|b] eR’. 
Cc 
We see that 
1 0 
v=a}s0O]+b] 1]}]+c]0 (3.10) 
0 0 1 


which shows that v can be written as a linear combination of vectors in S. So, v € Span S and R? G 
Span S. Together we have Span S = R?. 
Now, we show that S is linearly independent. The equation 


1 0 0 0 
a/O}+@)}1}]+y7y];0] =] 0 
0 1 0 
has unique solution 
ra 0 
B)= 10], 
y 0 


or, a= 3 = y=0. Hence by the linear dependence relation test (Theorem 3.3.8), S is linearly 
independent. We showed both conditions for a basis hold for S, thus this set is a basis for R?. 


The basis of the previous example is very natural and used quite often because, given a vector, we 
can easily see the linear combination of basis vectors that produces the vector (as in equation 3.10). 
This special basis is called the standard basis for R>. We also introduce notation for each of its vectors. 
We let e;, e2, and e3 denote the three vectors in S, where 


1 0 0 
ey = 0 >» Q= 1 A and 4Ba= 0 
0 0 1 


In general, the standard basis for R” is {e1, e2,..., €n}, where e; is the vector with zeros in every 
position except the ith entry, which contains a one. 
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Example 3.4.6 Another basis for R? is 


2 0 
B= O7,{1],]0 
1 1 


By Exercise 7b in Section 3.2, we have that Span B = R?. So, we need only show that B is linearly 
independent. Let 


2 1 0 0 
afO}/+@}1]+y{[0] =] 0 
1 0 1 0 
Then 
2a+ 2 0 
B = 1|0 
at+y 0 


Thus, by matching components, we see that a = 3 = y = 0. So, B is linearly independent and is 
therefore a basis for R?. 


The previous two examples make it clear that a basis for a vector space is not unique. Suppose we 
have a basis B = {v1, v2, --+ , vz} for some vector space V over R. Then B’ = {avy, v2, +++ , vn} is 
also a basis for V for any nonzero scalar a € R. 


Example 3.4.7 Which of the following sets of vectors in Figure 3.8 are bases for R?? Decide which 
you think are bases, and why, then check your answers below? . 


Fig.3.8 Six sets of candidate basis vectors for R2. (See Example 3.4.7.) 


6 (b) and (d) are bases; (a), (e), and (f) are not linearly independent; and (c) and (e) do not span R?. 
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Example 3.4.8 A basis for P2(R) is S = {1, x, x7}. In Exercise 1b in Section 3.3, we showed that 
S is linearly independent. And, SpanS C P2(R). So we need to show that P2(R) C Span S. Let 
v = ax* + bx +c € Po(R). Then v is a linear combination of 1,x, and x”. So v € Span S. Thus 
Span S = P2(R). Therefore, S is a basis of P2(R). 


The basis S = {1, x, x7} is called the standard basis for P2(IR), because, as with the standard basis 
for R?, one can trivially recover the linear combination of basis vectors that produces any polynomial 
in P2(R). 


Example 3.4.9 Let B = {1,x+ x2, x7}. In this example, we will show that B is also a basis for P2. 
In Exercise Ic in Section 3.3 we showed that B is linearly independent. So, we need only show that 
Span 6B = P2(R). This task reduces to showing that P2(IR) C Span B since it is clear that Span B C 
P2(R). Let v = ax* + bx +c € P2(R). We want to find a, 3, and ¥ so that 


o(1) + B(x +x?) + y(x?) = ax? + bx +c. 


Matching up like terms, we see thata = c, 3 = band 8+y=aory =a -— D. That is, v= c(1) + 
b(x) + (a — b)x? € Span B. Thus, B is a basis for P2. 


Example 3.4.10 Consider Z4(R), the vector space of 4 x 4 grayscale images, and the set 


_ so foal | 


Image A Image B Image C Image 4 


X is not a basis for Z4(R). The pixel-wise definitions of addition and scalar multiplication lead to a 
system of 16 equations in four unknowns when using the linear dependence relation. The system has 
only the trivial solution so X is linearly independent. But, Span X 4 Z4(R). 


3.4.3 Constructing a Basis 


While we can now appreciate that a basis provides an efficient way to describe all possible vectors in 
a vector space, it may not be clear just how to find a basis. For some common vector spaces, there are 
“standard” bases that are frequently convenient to use for applications. Some of these standard bases 
are listed in the table below. You should be able to verify that each of these does indeed represent a 
basis. Can you see why each is in fact called the standard basis of the corresponding vector space? 
Standard bases for image spaces, heat states, and other applications are explored in the Exercises. 
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Vector Space Standard Basis 
0 
(R”, +, -) {e1, €2,°-: , €n}, where e; = | 1 | with 1 in position 7. 
0 
(Pr(R), +, -) {1,x,x7,--- ar] 
(Minxn(R), +, -)|{Mi1, Mai, --+ , Minn}, where M jx is the matrix of all zeros except 
for a 1 in the j’" row and k" column. 


If we do not have a standard basis or we wish to use a different basis, then we must construct one. 
There are two main methods for constructing a basis. We can either start with a spanning set, and 
selectively remove vectors until the set is linearly independent (but still spans the space), or we can 
start with a linearly independent subset of the space and selectively add vectors until the set spans the 
space (and is still linearly independent). Both methods make use of the definition of a basis and both 
will seem familiar. 


Method #1: Spanning Set Reduction. Suppose we have a spanning set X for a vector space V. We 
might hope to find a basis B for V as a subset of X. If so, we must have Span B = Span X = V and B 
is linearly independent. In other words, is there a linearly independent subset of X that has the same 
span as X? The following theorem answers this question. 


Theorem 3.4.11 
Let (V, +, -) be a vector space and X a finite subset of V such that Span X = V. Then, there 
exists a basis 6 for V such that 6b C X. 


Proof. Let X = {u,,u2,--- ,ux}andSpan X = V.Weshow that there must exist a linearly independent 
subset 6 of X such that Span B = V. 

If X is linearly independent then 6 = X is a basis for V. (This is true even if X = @. If this is the 
case, since Span X = V, it must be that V = {0}, so X is a basis of V.) 

If X is not linearly independent, there exists a vector in X which can be written as a linear combination 
of the other vectors in X. Without loss of generality, suppose uz = ajuy + Q2U2 +--+ + Ap—1UK-1, 
and consider an arbitrary vector v in V. Observe: 


VU = au, + aQu2 +:++ + Ag—{UK—1 + Aug 
= ajuy + aqua +--+ + ag—jug—1 + ag (ayuy + agu2 + +++ + OR—jUK-1) 


= (a, — aga) uy + (a2 — aga) Uz + +++ + (Ag—1 — AK OK—1) UK-1- 
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So, v € Span {u1, u2,--- , ug—1}. In particular, Span {w;, u2,--- ,uz—1} = V. Now, if this set is 
linearly independent then it is also a basis for V. If not, repeat the reduction step by eliminating 
any vector from the set which can be written as a linear combination of the remaining vectors. The 
process must terminate at some point because we started with a finite number of vectors, and the empty 
set of vectors is defined to be linearly independent. At the end of this process, we have a basis for V. 


We now give several examples of this process at work. 


Example 3.4.12 Find a basis for ?2(R) from among the set X = esa eae = 1, Bh 

We leave it to the reader to first verify that Span X = P2(R). Next, we find, if possible, any element 
of X which can be written as a linear combination of the remaining vectors. We see that (3) = 
—3(x? —1)+ 3 (2x). So, Span X = Span fox", x, x7 — 1}. This remaining set is linearly independent 
(we leave it up to the reader to show this). Thus, B = eee x, x2 — 1} is a basis for P2(R). 


In the preceding example, we used the equation (3) = 307 1) 3 (2x?) to show that the 
set is linearly dependent. This is equivalent to 3(x2 — 1) = —-3 + 3 (2x7) and also to (x? — 1) = 
-33 + 5 (2x7). So, we could, instead, have eliminated the vector x* — 1 to get a basis for P2(R) 
consisting of {ox x, ce 


Example 3.4.13 Given the set X = {(4, 0), C1, 2), (3, 1), (1, 0)} C R?, find a basis for IR? that is a 
subset of X. 


We leave it to the reader to first verify that Span X = R?. Next, we eliminate any subset of vectors 
which can be written as linear combinations of remaining vectors. Notice that 


d,2)= a, 0) + 2, 1) 
(1,0) = na 0) + 0(3, 1). 


Thus, Span X = Span {(4, 0), (3, 1)}. Furthermore, since {(4, 0), (3, 1)} is linearly independent, 
B = {(4, 0), (3, 1)} is a basis for R?. 


In Example 3.4.13 above, there are many possible other choices of vectors to eliminate. What other 
combinations of vectors from X would also be bases? 

In Examples 3.4.12 and 3.4.13, we recognized one vector that was a linear combination of the 
others. Let us look at how we can use the tool of matrix reduction to determine which vectors form a 
linearly independent set. 


Example 3.4.14 Let us revisit Example 3.4.13. We can set up the linear dependence relation for the 
vectors in X as follows. Let a, 3, y, and 6 be scalars so that 


a(4,0) + Bd, 2) + 7B, l) + 61, 0) = (0, 0). 


We can then write the system of equations where each equation corresponds to a vector entry. That is, 
we have the following system of equations. 


4a+68+3y+6=0 
28+7=0. (3.11) 
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We can solve system (3.11) using matrix reduction as follows. 
4131)0 4-50 1)0 
(( 21 ilo) (( 2 1 ol) : Cle) 


It was easier to use columns 3 and 4 (corresponding to the variables y and 0) to get “leading” 1’s. The 
corresponding system of equations is 


4a —58+6=0 
286+7=0. 


So, a and @ are free variables and 6 = 5G — 4a and y = —2(.. Putting this into the linear dependence 
relation tells us that, for any scalars a and (, 


a(4, 0) + BC, 2) + (—28)(3, 1) + 56 — 4a)(1, 0) = (0, 0). 


We can choose a = | and @ = 0 and we find that 
(4, 0) — 41, 0) = (0, 0). 


That is, (4, 0) is a linear combination of (1, 0). Now, if we choose a = 0 and @ = 1 we see that (1, 2) 
is a linear combination of (1, 0) and (3, 1) as follows 


(1, 2) — 2(3, 1) + 51, 0) = (0, 0). 


Thus, Span X = Span{(1, 0), (3, 1)}. This is also indicated by the “leading” 1’s in the reduced echelon 
form of the matrix given in (3.12). 


Example 3.4.15 Consider the heat state example of Section 3.4.1. We constructed subset Y = 
{h1, h2,h3, hs} from X by eliminating heat states that could be written as linear combinations of 
vectors in Y. Then, we showed in Example 3.4.2 that Y is linearly independent. Thus, Y is a basis for 
Ay (RR). 


Method #2: Linearly Independent Set Augmentation. A second strategy for constructing a basis 
is to begin with a linearly independent subset and augment it, retaining linear independence, until it 
becomes a spanning set of the vector space of interest. This strategy is nearly identical to the strategy 
for building a spanning set discussed in Example 3.2.19. We simply modify the procedure to verify 
linear independence of the set as an additional test before accepting each new vector to the set. 

To verify that this method does indeed work to find a basis, we refer the reader to the theorems in 
Section 3.4.4. We delay the verification because we need to first discuss the topic of dimension of a 
vector space. We can, however, use Theorem 3.3.18 to verify that if we have a linearly independent set 
that is not a spanning set, we can add a vector to obtain a new “larger” linearly independent set. 


Example 3.4.17 Find a basis B for R?. 
We begin with By = Y. Bo is linearly independent but does not span R?. 
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We choose any vector from R°? to add to the current set By which is not in the span of the existing 
vectors. Since Span By = {0}, we can choose any nonzero vector. We will choose vector b; = (0, 1, 2)". 
Let By = {(, D yee B, is linearly independent but does not span R>. 

We continue by adding another vector from R? that is not in the span of B;. In this case, we 
can add any arbitrary vector that is not a scalar multiple of b; = (0, 1, ay", say by = (1, 1, 1!. By 
Theorem 3.3.18, we know that B, = {(, 1, 2)", di, 1, 1)"} is linearly independent. We next verify 
that Bz does not span all of R°?. Recall that the vector v = (x, y, z)' is in the span of b, and b2 if and 
only if v = ab; + bz for some a and 7 in R. Hence we row reduce the augmented matrix 


0 1\x 10 y-x 
(b; b2|v) = lljyJ~ fol ne 
21\z 0 0jz-—2y+x 


Hence the system is consistent if and only if z — 2y + x = 0; this equation describes a plane in R°, 
not all of R?. Hence we do not yet have a basis. 

We continue by adding another vector from R* which is not in the span of By. We must find a 
vector (a, b, oc)! such that (a, b, c)' 4 a1 (0, 1, 2)' + ag, 1, 1)! for any scalars a; and a2. One such 
vector is (1, 0, 0)' because this choice leads to an inconsistent system of equations in coefficients a1 
and a2. We have 63 = {(0, 1, 2)", dd, 1, 1)', qd, 0, 0)"} which is linearly independent. One can also 
verify that Span B3 = R?. 

Thus, B;3 is a linearly independent spanning set for R?. The set B = B; is a basis for R?. 


Example 3.4.18 Find a basis for the vector space of 7-bar LCD characters, D(Z2). 
Using the method of Linearly Independent Set Augmentation, we can see that a basis is constructed 
by the following set: 


B= = & “ee ey, pole , § Pj ( = : 


Each vector cannot be written as a linear combination of those that precede it. Once these seven 
characters are included in the set, it becomes a spanning set for D(Z2). (You are asked to verify these 
claims in Exercise 36.) 


3.4.4 Dimension 


Looking back at the examples of previous sections, we see that different bases for the same vector 
space seem to have the same number of vectors. We saw two different bases for R*, each with two 
vectors, and two different bases for R?, each with three vectors. We saw several different bases for 
P2(R), each with three vectors. 

Since a basis is a minimal spanning set (see Exercise 31), it should not be surprising that any two 
bases for a vector space should consist of the same number of vectors. Is this always true? If so, then 
the number of basis vectors is an important property of the vector space itself. Since each basis vector 
is linearly independent with respect to all others, larger bases span richer vector spaces. 
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In the following theorem, we prove that, for a finite-dimensional vector space, all bases have the 
same number of elements. 


Definition 3.4.19 


A finite-dimensional vector space is one that can be spanned by a finite set of vectors. 


Theorem 3.4.20 
Let (V, +, -) be a finite-dimensional vector space with bases B; = {v1, v2,..., vn} and Bz = 
{uj,U2,...,Um}. Thenn =m. 


Proof. Suppose both 6; and Bz are bases for V consisting of n and m elements, respectively. We 
show that m = n “by contradiction.” That is, we will assume that m 4 n and show that this leads to a 
logical inconsistency. Without loss of generality, suppose m > n. We will produce our contradiction 
by showing that B, cannot be linearly independent (and therefore could not be a basis). We write each 
of the elements of 6 as a linear combination of elements of 6;. Specifically, since B is a subset of 
V, we know that there exist a;,; for 1 <i < mand1 < j <n so that 


Uy =Q41Vj + Q1,202 +... + AL nn 


U2 =Q2,1V1 +7202 +...+ 2 nUpn 


Um =Am1V1 + Am,2V2 +... + Am,nVn- 
To determine whether Bo is linearly independent, suppose there is a linear dependence relation 
Bu, + Poua+...+ Bntm = 0. 


We solve for all possible 3), G2,..., Gm that satisfy this equation. If we replace wu}, u2,..., Up, with 
the linear combinations above and rearrange terms, we get 


(B1a1,1 + 6202.1 +...+ Bmam,1)V1 
+(B1.01,2 + B202,2 +... + Bnam,2)v2 


+(B1Q1,n + 8202,n ag =P Bin Om,n)Un = 0. 


Since {| is a basis, we get that the coefficients of vj, v2, ..., Vp, are all zero. That is 
Bra + B202,1 +...+ Bndm1 =0 (3.13) 
BPiai2+ b202,2 +...+ Bndam,2 =0 (3.14) 


: (3.15) 
Brain + B202.n+...+ BnOQmn = 0. (3.16) 
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This is a system of equations in the variables 3), G2, ..., Gm (recall that we want to know what possible 
values 31, 32, ..., Gm can take on). We know that this system has a solution because it is homogeneous. 
But, because there are m variables and only 1 equations, this system must have infinitely many solutions. 
This means that 62 cannot be linearly independent and so it cannot be a basis. 
Thus, if both 6; and By are bases of the same vector space, then n = m. 


Because the number of elements in a basis is unique to the vector space, we give it a name. 


Definition 3.4.21 


Given a vector space (V, +, -) and a basis B for V with n elements, we say that the dimension of 
V isn and write dim V = n. If V has no finite basis, we say that V is infinite-dimensional. 


When we proved Theorem 3.4.20, we actually showed that an n-dimensional vector space V has 
no linearly independent set with more than n elements. That is, if we have a set {u1, w2,..., ugk} CV 
and k > n, then we automatically know that the set is linearly dependent. This gives us another tool to 
make a quick check of linear dependence and may save us time. 


Corollary 3.4.22 
Let S be a k-element subset of n-dimensional vector space (V, +, -). If k > n, then S is linearly 
dependent. 


Corollary 3.4.22 also justifies our earlier assertion that Method #2 for constructing bases (Linearly 
Independent Set Augmentation) works, at least in the setting of finite-dimensional vector spaces. 
Indeed, one iteratively adds vectors to a linearly independent set, each time maintaining linear 
independence. We continue this until we arrive at a set containing n linearly independent vectors. 
Corollary 3.4.22 guarantees that adding another vector beyond n will create a linearly dependent set. 
Hence, every vector in the space must lie in the span of these n vectors, so those n vectors form a basis. 

The contrapositive of Corollary 3.4.22 is also true: If S is linearly independent, then k < n. So, 
then, a basis is the largest linearly independent set. 


Lemma 3.4.23 
Let S be a k-element subset of an n-dimensional vector space (V, +, -). Ifk <n then S is nota 
spanning set. 


Proof. Suppose B = {v1, v2,..., Un} is a basis for V and S = {s1, 52,...,5,}. We want to show 
that there is an element v € V with v ¢ Span S. We will assume also that S spans V and look for a 
contradiction. We break this proof into two cases: Case 1: S is linearly independent and Case 2: S is 
linearly dependent. 

Case 1: Suppose S is linearly independent. If S spans V then Sis a basis for V, but by Theorem 3.4.20, 
k =n. Since k <n, we have found a contradiction. 
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Case 2: Suppose S is linearly dependent. Then some of the vectors in S can be written as linear 
combinations of the others. This means there is a subset S’ C S that is linearly independent and 
Span S’ = Span S = V. But then S’, by definition is a basis. Again, this contradicts Theorem 3.4.20. 

Thus, S cannot be a spanning set. 


Perhaps even more surprisingly, in an n-dimensional vector space, any set of n linearly independent 
vectors is a spanning set, and hence a basis. 


Theorem 3.4.24 
Let (V, +, -) be an n-dimensional vector space and S$ C V.S is a basis for V if and only if S is 
linearly independent and contains exactly n elements. 


Proof. (=) Suppose S is a basis for the n-dimensional vector space V. Then S is linearly independent 
and Span($) = V. By Lemma 3.4.23, S must have at least n elements. Also, by Corollary 3.4.22, S 
cannot have more than n elements. Thus, S$ contains exactly n elements. 

(<=) Now suppose S is a linearly independent set containing exactly n vectors from n-dimensional 
vector space V. If S is not a basis for V then S is not a spanning set of V and a basis for V would 
contain more than n vectors, a contradiction. Thus, S is a spanning set (and basis) for V. 


In the next few examples, we illustrate how we find the dimension of various vector spaces. 


Example 3.4.25 Because Y is a basis for H4(R) with 4 elements, H4(R) is 4-dimensional. 


Example 3.4.26 Let 


V= y]J) x+y+z=0,2x+ y—4z=0,3x + 2y —3z=0 
z 


Notice that V is the solution set of a homogeneous system of equations. So, we first, show that V is a 
subspace of IR? (and therefore a vector space). We show next that V has dimension 1. 

First, we will represent V as the span of a set of vectors, and check the linear dependence of this 
set. (If the spanning set is not linearly independent we can use Spanning Set Reduction to pare it down 
into a basis.) We first reduce the matrix corresponding to the system of equations 


x+ y+ z=0 
2x+ y—4z=0 
3x +2y —3z=0. 


The reduction results in the reduced echelon form 


11 1/0 10 —5)0 
21—-4)/0] ~ [01 6/0 
3 2 —3)0 00 0/0 


Thus, the solution is of the form 
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where z can be any real number. This means we can rewrite V as below 
5 5 
V=4z]-6]|} ze R} = Span —6 
1 1 


The set 


is linearly independent and spans V. Thus it is a basis for V. Since B has one element, V has 
dimension 1. 


Example 3.4.27 Let V = {ax? +bx+c|la+b—-—2c= O}. We show that V is a subspace of P2. 
Indeed, Ox? + Ox +0 € V and if vy = ayx? + byx +c] and v2 = vy = anx? + box + co are vectors 
in V and a, / are scalars, then 


aj +b, —2cy=0 and ag+b2 — 2c. = 0. 


Now, 
avy + Bv2 = (aay + Bar)x? + (ab; + Bb2)x + acy + Ber. 


Also, we see that 


aa, + Bar + aby + Bho + 2acy + 26c2 = a(ay + by — 2c1) + Blan + b2 — 202) 
=0+0 
= 0. 


Thus, av; + Gv2 € V. Therefore, V is a subspace of P2. Now, below, we show that V is 2-dimensional. 
Indeed, we can rewrite V: 


V = [Qc—b)x? + bx +e|b,ceR| 
= |(-27 + x)b + 2x? + Delb,c eR} 


= Span{—x? + x, 2x? + 1}. 


Now, we can see that the elements of the set 6b = {=x tex, Ox? 4 1} are not scalar multiples of 
one another so B is linearly independent. Thus, B is a basis for V. Since 6 has two elements, V is 
2-dimensional. 


What we see is that, in order to find the dimension of a vector space, we need to find a basis and 
count the elements in the basis. 
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Example 3.4.28 Consider the vector space 7512x512x512 of grayscale brain images represented as 
voxels (cubic pixels) ina 512 x 512 x 512 array. Let bj, ;,, be the brain image of all zero values except 
a | at array location (i, j, k). The set of images 


Bal gell=i. jes 512} 


isa basis for V with 512 x 512 x 512 = 136,839, 168 elements. Thus, V is a vector space of dimension 
136,839, 168. 


Example 3.4.29 Consider the vector space V of color images from a 12-megapixel phone camera 
(see Section 2.4, Exercise 6). Images are represented on a rectangular grid with 3024 rows and 4032 
columns, such as Figure 3.9. The color in each pixel is determined by the relative brightness of red, 
green, and blue light specified by three scalars. 

Let pj,j,« be the image of all zero values except a 1 in the i” row, j’" column and k'" color 
(red,green,blue). The set of images 


B= {pijx|1 <i < 3024, 1 < j < 4032, k =1,2,3} 


is a basis for V with 3024 x 4032 x 3 = 36,578,304 elements. Thus, V is a vector space of dimension 
36,578,304. 


Fig.3.9 Anexample of a 12-megapixel phone camera image with 3024-pixel rows, 4032-pixel columns, and three color 
channels (red, green, blue). 
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3.4.5 Properties of Bases 


We began this section by considering X = {h1, h2,h3,ha,hs, he}, a spanning set for H4(R), the 
vector space of heat states of m = 4. This means that any heat state in H4(R) can be written as a linear 
combination of these six representative heat states. This, in itself, provides a compact way to catalogue 
and represent heat states. But we went further by showing that Y = {h1, h2, h3, hs} is also a spanning 
set and has fewer representative states. So, for efficient descriptions, Y makes a better catalogue. Then 
we showed that no catalogue can be smaller than Y. Any smallest such catalogue is called a basis, and 
the number of elements in a basis is called the dimension of the vector space that the basis spans. 

However, we also saw that a basis for a vector space need not be unique. Much of our work 
with applications and their descriptions will center around good choices of bases. This choice can 
dramatically affect our ability to (a) interpret our results and descriptions and (b) efficiently perform 
computations. 

There are some important questions we have yet to answer. Because we have seen that a vector 
space is easily described by very few vectors when it has a basis, we want to know that the basis exists. 
The good news is that every vector space has a basis. In the next section we address this question with 
a proof of the existence of a basis. We also, address the question of uniqueness of representing a vector 
as a linear combination of basis elements. Can we find more than one representation given a particular 
basis? This question is important because alternate representations might make it difficult to create 
algorithms that answer questions about the vector space. 


Existence of a Basis 


As just mentioned, it is important to know that every vector space has a basis. The next theorem and 
its proof give us this result. The idea is that, because of the way we defined a finite-dimensional vector 
spaces, we know that there is a finite spanning set on which we may apply the spanning set reduction 
algorithm to get a basis. We state the result here as a corollary to Theorem 3.4.11. 


Corollary 3.4.30 
Every finite-dimensional vector space has a basis. 


Proof. Suppose V is a finite-dimensional vector space. Then V is spanned by some finite set X. By 
Theorem 3.4.11, there exists a basis for V that is a subset of X; hence V has a basis. 


Most examples in this section have involved finite-dimensional vector spaces. We have considered 
vector spaces R”, P,,(R), D(Z2), Mimxn(R) as well as some image spaces, all of which are finite- 
dimensional. 

Although we will mostly refrain from discussing infinite-dimensional vector spaces in the remainder 
of this book, we make a few observations here. One can show’ that every infinite-dimensional vector 


7 The proof that every vector space (including infinite-dimensional vector spaces) has a basis requires a tool called Zorn’s 
Lemma, which is beyond the scope of this course. The technique of Linearly Independent Set Augmentation will allow 
one to create larger and larger linearly independent sets, but it is not clear how to proceed with this process ad infinitum. 
Zorn’s Lemma allows one to circumvent this issue (and also allows for uncountable bases, which a sequential application 
of Linearly Independent Set Augmentation would not accommodate). 
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space has a basis (that is, a maximal linearly independent subset) as well, though it may not be clear 
how to construct such a basis. Sometimes, however, a basis for an infinite-dimensional vector space 
can be readily constructed using the method of Linearly Independent Set Augmentation—extended 
to account for non-finite spanning sets. The following examples are two infinite-dimensional vector 
spaces for which we can easily construct infinite bases. 


Example 3.4.31 (Infinite-Dimensional Vector Space) Consider the set of all polynomials P (R). Every 
polynomial has a finite number of terms and so, we begin listing simple 1-term polynomial vectors to 
create the standard basis {1, x, x2, x3,--- }. Although we do not formally prove it here, we note that for 
k € N, x* cannot be represented as a finite sum of other 1-term polynomials; hence the set is linearly 
independent. 


Example 3.4.32 (Infinite-Dimensional Vector Space). Consider S(R), the set of sequences of real 
numbers with a finite number of nonzero entries. In a very similar way, we can begin listing basis 
elements {51, 52, 53, --- } as follows 


s, =(1, 0,0, 0,...) 
i (0. 1 OED) 
53 =(0,0, 1,0,...) 


where the kth vector in the basis is the sequence whose kth term is 1 and all other terms are 0. Again, 
though we will not prove it formally, s,; cannot be written as a linear combination of finitely many 
other sequences, so {s1, 52, 53, ...} is linearly independent. In addition, if a sequence is in S(R), it has 
finitely many nonzero entries, so it can be written as a linear combination of the vectors corresponding 
to its nonzero entries. 


On the other hand, a space such as the vector space of all sequences of real numbers does not have 
such a simple basis. To see this, observe that the sequence vector (1, 1, 1, ...) cannot be written as a 
linear combination of the vectors s1, 52, 53, ... (recall Definition 3.1.1). If we were to then add this 
vector, it still would not be a basis® . 


Uniqueness of Linear Combinations 


The concept of basis evolved from finding efficient spanning sets used as representatives of all possible 
vectors in a vector space. However, we could think of a basis from a different viewpoint. We could 
instead look for a set 6 so that every vector in V has a unique representation as a linear combination 
of vectors in B. Does such a set always exist? The answer, amazingly enough, is “yes.” 

Consider the vector space Z10gx 103 (R), the set of 108 x 108 grayscale images on a regular grid. 
Suppose we are interested in brain scan images in this space such as those shown in Figure 3.10. We 
could hypothetically take a brain scan of every living human and have a complete catalogue of images. 
This database would contain about 8 billion images. At first glance, it might seem reasonable to conclude 
that such a database would be sufficient for describing all possible brain images in Z193x 1903(R). 

However, there are three main problems with this conclusion. 


8 A typical proof to show that no such basis is found uses a diagonalization argument. We encourage the interested reader 
to look at a similar argument made by Cantor. 
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Six example brain scan images in the vector space Z10gx 108 (IR). 


e It may still be possible for the brain image of a new person not to be in the span of the 8 billion 
existing images. 

e Even if a new brain image is in this span, it may not be simply described in this catalogue. 

e Even if a new brain image is in this span, it may not be uniquely described with this catalogue. 


In fact, as few as 11,664 images are needed to generate (with linear combinations) the entire set 
of possible brain images. Moreover, every brain image is uniquely described using this catalogue. It 
turns out that any catalogue that is comprehensive (generates the entire space via linear combinations) 
and gives a unique representation of the brain images in the space Z10gx10g(R) is in fact a basis for 
Ti0gx10g(R). This useful characterization is true more generally for vector spaces. 


Theorem 3.4.33 
Let X = {u1, u2,--- , Un} be a subset of vector space (V, +, -). Then X is a basis for V if and 
only if each vector in V is uniquely expressed as a linear combination of vectors in X. 


Proof. Let X = {uj, u2,--- , un} be a subset of vector space (V, +, -). 

(=>) Suppose X is a basis for V and v is an arbitrary vector in V. Suppose, by way of contradiction, 
that v can be expressed as a linear combination of vectors in X in more than one way. In particular, 
suppose 

v= Quy + a2U2 +--+ + AnUn, 
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and 
v= Biuy F G2u2 ai Bnun, 


where 3; 4 ax for some | < k <n. Then we have 
O=v—ve (ay — B1)uy + (a2 — B2)u2 +++ + (An — Ba)un- 


At least one coefficient (a; — 3%) 4 0 which implies that X is linearly dependent, a contradiction. So 
Ox = Gx for all k, and the expression of v as a linear combination of vectors in X is unique. 

(<=) We will prove the contrapositive; that is, we will show that if X is not linearly independent, 
then there is not a unique representation for every vector in terms of elements of X. In other words, we 
will prove that if X is linearly dependent, then there exists a vector v € V that can be represented in 
two ways with as linear combinations of vectors from X. Toward this end, suppose that X is linearly 
dependent. Then there are scalars a1, 2, ...,@%,, not all zero, so that 


0 = ayuy + a2QU2 + +++ + AnUn. 


Without loss of generality, suppose a; 4 0. Then 


uy = — (a2/a1)u2 — (a3/01)u3 — +++ — (An/O1)Un 
and 
uy =1. uj. 


In other words, the vector u; can be expressed as two distinct linear combinations of vectors in X. 
Since X C V, we have a vector, uw; in V that can be expressed as two distinct linear combinations of 
vectors in X. 

Therefore, if each vector in V is uniquely expressed as a linear combination of vectors in X, then 
X must be linearly independent. Also, since each vector can be expressed as a linear combination of 
vectors in X, X is a spanning set for V; hence X is a basis. 


The following two examples show how this theorem can be useful in the vector spaces P2 (IR) and 
D(Z2). 


Example 3.4.34 The vector space P2(R) has dimension 3, so any basis has three elements. However, 
the three-element set X = {x? +x,x+1,x2- 1} is not a basis. We could show that X is linearly 
dependent. Or, we could notice that vector x7 + 2x + 1 is not uniquely expressed in terms of the 
elements of X: 


eo Oe a 1G ee) + Ie + 1) 0G"? = 1), 
and 


e429 41590" +5) 40041) -167 =); 


Example 3.4.35 The set 6 in Example 3.4.18 is a basis for the vector space D(Z2) because any char- 
acter is uniquely expressed in terms of the basis vectors. To show this, let 8 = {b, b2, b3, ba, bs, be, b7} 
in the order shown. Consider the representation of an arbitrary character, say v = a,b, +--- + .7b7. 
We will show that there is a unique choice of scalar coefficients. 
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Notice that a7 is entirely dependent on whether the lowest horizontal bar of v is green (a7 = 1) or 
not green (a7 = 0). Remember that the scalar field is Z2 = {0, 1} so we have uniquely determined a7. 

Next, notice that a6 is determined by whether the lower-right vertical bar of v is green or not green 
and by the value of a7. In particular, if this bar is green in v then we must have | = ag + a7. And if 
this bar is not green in v then we must have 0 = a6 + a7. At each step we must be careful to perform 
all arithmetic in Z2. So, we have uniquely determined a6. 

Next, notice that as is determined by whether the lower-left vertical bar of v is green or not green and 
by the values of a and a7. In particular, if this bar is green in v then we must have 1 = a5 + a6 + Q7. 
And if this bar is not green in v then we must have 0 = as + a + a7. So, we have uniquely determined 
a5. 

Continuing this process uniquely determines all coefficients a1, --- , a7. Thus, by Theorem 3.4.33, 
B is a basis for D(Z2). 


Theorem 3.3.14 allows us to derive the following key relationships between a basis for R”, linear 
combinations of vectors expressed in terms of this basis, and the matrix equation corresponding to 
such linear combinations. 


Corollary 3.4.36 
Let M be ann X n matrix with columns v1, v2,..., Un. Then, B = {v1, v2, ..., Un} is a basis 
for IR” if and only if Mx = b has a unique solution x € R” for every b € R”. 


Proof. (=) Suppose M is ann x n matrix with columns vj, v2,..., Un, and B = {vj, v2,..., Un} isa 
basis for R”. Then, B is linearly independent. Thus, by Theorem 3.3.14, Mx = b has a unique solution 
x € R” for every b € R”. 

(<=) Suppose M is ann x n matrix with columns vj, v2,..., Un, and let B = {v1, v2,..., Un}. 
Further, suppose Mx = b has a unique solution x € R” for every b € R”. By Theorem 3.3.14, B is 
linearly independent. Thus, by Theorem 3.4.24, B is a basis for R”. 


By Corollary 3.1.32, we can equivalently restate the previous result in terms of the matrix equation 
Mx =0. 


Corollary 3.4.37 
Let M be ann X n matrix with columns v1, v2,..., Un. Then, B = {v1, v2,..., Un} is a basis 
for R” if and only if Mx = 0 has only the trivial solution x = 0. 


We conclude with two examples using these corollaries for the case n = 3. 
Example 3.4.38 Consider the matrix equation Mx = b where 
1 1-2 —5 


M={-21 1 and b=| 7 
-12-1 2 
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Notice that 
x x 0 
y}J=]-1 and |y]|=]3 
v4 0 Zz 4 


are both solutions to Mx = b. Thus, by Corollary 3.4.36 the set of column vectors of M is not a basis 
for IR?. We can also say that, because there are three column vectors, we know that they must form a 
linearly dependent set. (Recall, Theorem 3.4.24.) 


Example 3.4.39 Consider the set of vectors in R? 


1 1 -1 
B= 07.12]. 1 
1 1 1 


Consider also the matrix M whose columns are the vectors in B. We can solve the matrix equation 
Mx = Oto determine whether or not B is a basis for R*. Using matrix reduction, we obtain the reduced 
echelon form of [M|0] as follows 


11 —1)0 100)/0 
02 1/0} ~ 1010/0 
11 140 00 1/0 


Therefore Mx = 0 has only the trivial solution. Therefore, by Corollary 3.4.37, B is a basis for R?. 


3.4.6 Exercises 


For Exercises | to 3, determine whether the set is a basis for P2(R). 


1. {1,x + x?, x7} 
2 1 aot eh 
a. lay ia ae) 


For Exercises 4 to 10, suppose {v1, v2, v3, v4} is linearly independent. Determine whether the set is a 
basis for Span{v1, v2, v3, v4}. Justify your answer. 


4. {u1, v2} 

5. {v1, v2, v3, v4, vy — 203} 

6. {v1 + v2, v3, v4} 

7. {v1 + v3, v2 + v4, 03, U4} 

8. {vy + v2 + v3 + v4, Vv) — V2 + V3 — V4, VI — V2 — V3 + U4, V1 — V2 — V3 — V4} 
9. {vy — 2v2, v2, v3 — V4 — V2, V4} 
10. {vy — v2, vy + v2, 201 + v2 — v3, vy — V2 — V3 — 2v4, V3 — v4} 


For Exercises 11 to 14, decide whether or not B is a basis for the vector space V. 


1 1 3 0 
11. B= 1),/2),/2], Jo) $,v=R 
1 3 1 1 
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2-6 ((). (r= 
2 e{('9)(2).G9)-(Ga)y=seoe 


14. B= {x*,x?+x,x?+x4+2},V = PR) 
15. Prove that there is no basis of H4(R) consisting of three or fewer heat states. 


For Exercises 16 to 20 find a basis B (that is not the standard basis) for the given vector space. State 
the dimension of the vector space. 


re {(sa5) 


17. {cx? + 3bx — 4a|a — b — 2c = 0} 


a+b +e-2d=0,043b—4+d=0,a-d +b=e} 


18. M3,2(R) 
19. P3(R) 
1 1 3 0 
20. span 1)},])2], 42 0 
1 3 1 1 


21. Given the set B = {u, v, w}. Show that if B is a basis, then so is 
B= {u+2v,u—w,v+u}. 


22. Using Exercise 21, make a general statement about how to get a basis from another basis. Be 
careful to use accurate linear algebra language. 


For Exercises 23 to 26 provide specific examples (with justification) of the given scenario for vector 
space V = R?. 


23. B, is anonstandard basis. 

24. Bo is anonstandard basis with By NB, = g. 

25. W C V where W has three elements, Span W = V, and W is not a basis for V. 

26. W C V where W has three elements, Span W 4 V, and W is not a basis for V. 

27. Does the vector space {0} have a basis? If so, what is it? If not, show that it cannot have a basis. 

28. Find a basis for D(Z2) which has no elements in common with the basis of Example 3.4.18. 

29. Determine whether B = {1,, In, Iz}, where the J, are given below, is a basis for the vector space 
of images with the same geometric orientation as each of the /,, below. 


oy, “ 7, 
CnC BOA@ BURG, 
7, . 7, 
7, = 7, 
OND) COA? CKD 
Y) | iY) 


30. What is the dimension of the vector space of heat signatures given in Section 3.4.1? 


3.4 


31. 


32. 


33. 


34. 


35. 
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Prove that a basis is a minimal spanning set. Specifically, suppose that B is a basis for vector space 
V over R. Using just the definition of a basis, prove that if any vector is removed from £6, the 
resulting set no longer spans V, and hence is not a basis for V. 

Show thatif B = {v1, v2, --- , Uz} isabasis for vector space V over R, then B’ = {av}, v2,--- , Un} 
is also a basis for V for any nonzero scalar a. 

Given a basis of some vector space, V given by B = {v1, v2,--- , v,}. Determine whether Bb = 
{U1 — Un, V2, +++ , Un} is also a basis. Prove your result. 

Prove that a subset X of a vector space V over R is a basis for V if and only if X is a maximal 
linear independent subset of V. In other words, prove that there is no superset of X (i.e., there is 
no strictly larger set containing X) that is linearly independent. 

In Section 3.1, Exercise 28 and Section 3.3, Exercise 19, we began combining equivalent 
statements. Continue combining equivalent statements by completing the theorem below. (If you 
have already completed these exercises, you need only add one more statement from this section 
connecting these ideas to bases of IR”. Otherwise, you can look at the exercise statements for hints 
about the earlier statements.) 


Theorem 3.4.40 
Let A be ann x n matrix and b € R”. The following statements are equivalent. 


e Ax = 0 has a unique solution. 


Prove Theorem 3.4.40. 


36. 


37. 


Verify that the given set B in Example 3.4.18 is a basis for the vector space of 7-bar LCD characters, 
D(Z2). Specifically, prove the following. 


(a) Explain why each vector in 6 cannot be written as a linear combination of those that precede 
it. 
(b) Explain why 6 is a spanning set for D(Z2). 


Verify that the given set 6 in Example 3.4.28 is a basis for the vector space of 512 x 512 x 512 
arrays of real numbers. 


For Exercises 38 to 40, determine whether or not W is a vector space. If not, justify. If so, prove it, 
find a basis for W, and determine the dimension of W. 


w={yiy=()2)x fore eR’. 


11 
w={(o3) +41 € Masa} 


‘ w={xiy=(93)xtory eR’. 
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41. Show that the space of continuous functions on [0, 1] is not finite-dimensional. 


3.5 Coordinate Spaces 


We have now seen that a vector space or subspace can be most efficiently described in terms of a 
set of basis vectors and we understand that any vector in the space can be written as a unique linear 
combination of the basis vectors. That is, each vector in the space can be written as a linear combination 
of the basis vectors in exactly one way. This uniqueness suggests a very simple method for creating a 
cataloging system for vectors in the vector space. This system will help us represent vectors in a more 
standard way. Using the fact that the number of basis vectors is the dimension of the space, we are able 
to further describe the space in which these standardized vectors are located. 
When considering the vector spaces of images, it becomes tedious to always need to write 


ae seo | ) 
5) 


Image A Image B Image C Image 4 


Instead, because we have already found that Image 2 can be written uniquely as 


18 2 o-o 


Image 2 Image A Image B Image C Image 4 


we prefer a more compact way of expressing Image 2 as a particular linear combination of Images 
A, B, C and Image 4. In this section, we will describe a means to write image vectors (and any other 
vector) in a more compact and standardized way. 

Suppose we have a basis B = {u1, u2,--- , ux} for a k-dimensional vector space V. Consider an 
arbitrary vector v in V. Since G spans V, there exist coefficients a;, a2,--- , ag such that v = ayuy + 
Q2U2,--+ , aux. And since B is linearly independent, the coefficients are unique — there is no other 
way to express v as a linear combination of v1, u2,--- , ug. Thus, the coefficients a1, a2,--- , az are 
uniquely associated with the vector v. We will use this understanding to create the cataloging system. 


3.5.1 Cataloging Heat States 


Let’s see how these ideas work in the vector space H4(R). We found in Section 3.4.1 that Y = 
{h1, h2, h3, hs} formed a basis for the space of heat states H4(R) (See also Example 3.4.4). This 
basis is shown in Figure 3.11. Now, Y can be used to form a cataloging system for H4(R). To see 
how this works, consider the heat state v shown in Figure 3.12, chosen somewhat arbitrarily. We seek 
coefficients a1;, @2, @3, and a4 so that 


v= ayhy + azh2 + 0a3h3 + aghs. (3.17) 
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r + + + + + 


a0 z2=3 2=0 z= k 


Fig.3.12 Arbitrarily chosen heat state v in H4(R). 


We rewrite Equation 3.17 as the system of equations 


Qa = 1/2 
a3 = 1 

a2 +a3 +a4 = 1/4 
Q] +az3 +a4 = 1 


The unique solution to the system of equations gives us that the scalars in Equation 3.17 are 


1 1 
,a2 = —-,a3 =1, andag=—-. 
a2 4 a3 a4 5 


These coefficients, along with the given basis Y, uniquely determine the heat state v. Any heat state 
can be completely and uniquely described by coefficients relative to a basis. We now lay out the path 
to express these vectors in terms of coordinates. 


Definition 3.5.1 


Given a basis 6 for the finite-dimensional vector space (V, +, -), we say that G is an ordered basis 
for V if a particular order is given to the elements of B. 


Let us consider an example to make sense out of this definition. 


Example 3.5.2 We know that 
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1 0 0 
S= O7,{1],.]0 
0 0 1 


is the standard basis for R?. We also know that 


i 0 0 1 
S= ,J ly. [0 
1 0 


is the same basis. But, we would not consider S and S to be the same ordered basis because the first 
and last basis vectors in S are the last and first basis vectors in S. That is, the vectors are in a different 
order. 


Definition 3.5.3 


Let v be a vector in the finite-dimensional vector space (V, +, -) over F andlet B = {u), u2,--- , ux} 
be an ordered basis for V. Then the coordinate vector of v relative to B is 


Q) 
a2 Z 
lup=]. fer, 
Ok 
where @1, Q2,---, a are the unique coefficients such that 


Vv = aQyuy + AQUu2 +++: + Aug. 


Now, as long as we have an ordered basis for a vector space, any vector in the space is uniquely 
determined by its coordinate vector. 


Example 3.5.4 Looking back at the example above with heat states, suppose we have the coordinate 
vector 
1/2 
1/2 
wis =| 1°], 
-1 


with 6 defined above and w € H4(R). This coordinate vector tells us that 
w = (1/2)hy + C1/2)ho + ()h3 + (—Dhs. 


The reader can verify that the heat state, w is 
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xx Watch Your Language! Coordinate vectors [v] are vectors in F* for a given basis B of a k-dimensional 
vector space (V, +, -). Vectors v in V can be very abstract objects such as a grayscale image, a polynomial, 
a differentiable function, a 7-bar LCD image, a heat state, etc. However abstract they may be, they can be 
catalogued as vectors in Fe given an ordered basis. 


J [v]g is a coordinate vector for the vector v. 
VY [vig eF*. 

Y V isak dimensional vector space. 

VY vé Visrepresented by [v]z € F¥. 


But it would be incorrect to say 


[v]p € V. 
veR*, 
vV=F¥, 
v= [v]p. 


* &< > 


3.5.2 Coordinates in R” 


We are already familiar with coordinate representations of points in the x y-plane and x yz-space (which 

we call “3D space’’). You might wonder if these coordinates are related to the coordinates that we are 

exploring now. Now, we connect these two notions of coordinates. In our typical 3D space, we talk about 
x 

vectors that look like this: v = | y |. When we say that the coordinates are x, y, and z, respectively, 
z 

we mean that the vector points from the origin to a point that is x units horizontally, y units vertically, 

and z units up from the origin (as in Figure 3.13). If x, y, and z are coordinates relative to some ordered 

basis, then it is natural to ask what the chosen ordered basis is. 

We can write 


x 1 0 0 
v=|/y}=x]{O}+y]1]4+z]0 
Zz 0 0 1 


From this, we see that x, y, and z are the scalar weights associated with writing the vector as a linear 
combination of vectors in the standard basis for R* 


Fig.3.13 The coordinate representation of a point v in 3D space with coordinates x, y, and z. 
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1 0 0 
S= O7,{, 1]. ]0 
0 0 1 


That is to say, our usual interpretation of coordinates in 3D assumes that we are working in the standard 
(ordered) basis for R?. 


Example 3.5.5 Now, let us consider a different ordered basis for R?: 


1 0 
B= 4vu={1)],u=]0),13=]0 
1 1 


1 
The coordinates of the vector w = | 2 } in the standard basis are 1, 2, and 3, respectively, but in this 
3 
alternate basis, they are found by finding the scalars a1, a2, and a3 so that 
1 1 1 0 
w= ]2)=a,]}1]}]+a][0}]+a3]0 
3 1 1 1 
We find that a; = 2, a2 = —1, and a3 = 2. (Be sure to check this.) So, we can represent the vector v 
in coordinates according to the basis B as 
2 
lwlp=|-1 
2 


The vector w in R? is the vector that points from the origin to the regular cartesian grid location one 
unit along the x-axis, then two units along the y-axis direction, and then three units along the z-axis 
direction. [w], is the representation of v relative to basis 6. The vector w itself remains unchanged. 
See Figure 3.14. 

If we are given bases for R?, B, = {v1, v2, v3} and By = {u1, v2, u3}, then if w = ayv; + a2v2 + 
3U3, we have 


(oa 
[w]p, = 1 a1 
a3 
And, if w = Gi u1 + Gou2 + 833, we have 
By 
[w]p, = fen 
(3 


w, [w]p,, and [w]x, are different representations of the same vector. 
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Zz *z 


ve 


Fig.3.14 The coordinate vectors [w]s (left) and [w], (right) from Example 3.5.5. 


1 
Example 3.5.6 Let w = | —1 ]. We leave as an exercise to find [w]g, and [w]g,, given 
1 
1 1 0 
Bi = 1],];0],]0 
1 1 1 
and 
0 
By = »{ 17,70 
1 1 


See Exercise 3. 


3.5.3 Example Coordinates of Abstract Vectors 


The following examples illustrate that abstract objects in a k-dimensional vector space V can be 
represented as coordinate vectors in F*, given an ordered basis for V. Coordinates present a useful 
and powerful tool for cataloging vectors and performing computations because their representations 
are familiar vectors in F*. 


Example 3.5.7 Consider the 16-dimensional vector space of 4 x 4 grayscale images. Recall, this 
vector space is defined to have scalars in R. Let us, now, see how we can catalog 4 x 4 grayscale 
images using vectors in the more familiar vector space, R!°. Let v be the image 
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where black indicates a pixel value of zero and white indicates a pixel value of 3. Consider the standard 
(ordered) basis 6 = {b,, b2,--- , b16} as shown: 


- Fr ELT EE. | 


Each element of 6 consists of all black pixels except for a single pixel with a value of 1. We have 


v = 3b, + 1b + 1b3 + 2b4 + Obs + Obe6 + 2b7 + Obg 
+ 3b9 + Obio + 3b11 + 2b12 + 1b13 + Ob14 + 2b15 + 3b 16, 


so that 
[vlg = 3112002030321023)'. 


The coordinate vector [v]g is in R!® and represents the 4 x 4 grayscale image v. In this standard 
ordered basis the coordinate values in [v]g are the same as the pixel values in v (in a particular order). 


In the next example, we consider coordinate vectors in the less familiar vector space vas The process 
for finding such coordinate vectors is the same. 


Example 3.5.8 Let us consider the 7-dimensional vector space of 7-bar LCD images whose scalar set 
is Z>. Consider the two different bases 
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and 


By = ¢ Sco, Seok, Feo! | et, ee ee pe 


Let v be the “4” character. To find the coordinate vector [v],, we write v as a linear combination of 
the basis elements in 6;. Actually, v can be written as a linear combination of only the second, third, 
fourth, and sixth basis elements. Indeed, 


0 
1 
1 
Therefore, [v]p, = | 1 |] € val . Now, if we write v as a linear combination of the basis elements in 
0 
1 
0 
Bo. We find 
o It LD, it : Lt | 


Notice that [v]g, = is different than [v]g, even though they represent the same 7-bar LCD 


oF Fe KF OOF 


character. 
Remember, v is the LCD image of the “4” character, not a vector in vag And, [v]g, and [v]p, are 


two different coordinate vector representations in Z}, with respect to two different (ordered) bases, B, 
and Bp. 


We provide the next example not only to practice finding coordinate vectors, but to also use a 
coordinate vector and an ordered basis to find the corresponding vector. 


Example 3.5.9 Consider the subspace V = fax +bx+cla+b—-—2c= o} C P2(R). We saw in 
Example 3.4.27 that a basis for V is 8B = {—x7 + x, 2x* + 1}. We also know that v = 3x? — 3x € V. 
This means that we can write v as a coordinate vector in R?. We do this by finding a1, a2 € R so that 
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= 2 2 
v= ay(—x* + x) +.a2(2x* + 1). 


Notice that 
v = —3(—x* + x) + O(2x? + 1). 


[v]g = (>) eR’. 


Working within the same subspace, we consider another coordinate vector, 


[wlp = (3) eR’. 


Using our understanding of the coordinate vector notation and which ordered basis is being used to 
create the coordinate vector, we write w as 


Therefore, 


wl Oe 42) DOrt + lS 4 4 2 1 V. 


In some instances, it is much easier to work in one coordinate space than in another. The following 
example shows how we can change a coordinate vector, according to one ordered basis, into another 
coordinate vector, according to a different ordered basis. 


Example 3.5.10 Consider the subspace 


a a+b 
V= 
a—b b 
In order to catalog vectors in V, using coordinate vectors, we must first find a basis for V. Notice that 
we can write V as the span of the linearly independent vectors in 


{10} ip 
v= qe(ro) #953) 


a,be R| C Mo x2(R). 


Indeed, 


a.bER 


Consider the vector w so that 


Then, we know that 
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Suppose we would like to change to a different coordinate system, relative to the ordered basis 


®={(01)-(2")} 


(Check to see that you actually believe this is a basis of V.) That is, suppose we want to write [w],,, 
the coordinate vector according to the basis 67. Then, we need to find scalars a and (3 so that 


_f{ 1 3\_ 12 +8 1 0 
PN eh) Nl Fetj* 
By inspection of the matrix entries, or using elimination on a system of equations, we see that a = ; 
and 3 = —3. Therefore, 


The next two examples provide more practice and clarity for the reader. 
Example 3.5.11 Consider the subspace V = {ax* + (b— a)x + (a+b) | a,b € R} C Po(R). Since 
V = {a(x? —x +1) +d(¢4+ 1) | a,b € R} = Span{x? —x+1,x+4+ 1}, 


a basis for V is 
B= {vj =x? —-x4tlw=xt Ih. 


Thus, dim V = 2 and vectors in V can be represented by coordinate vectors in R*. Let v = 3x7 + x + 7. 
We can write v in terms of B as v = 3v, + 4v2 (check this). Therefore, v € V and the coordinate vector 


for v is [v]g = (;) € R?. 


Example 3.5.12 Now, consider the subspace W = | 6 ade - i 3) a, B,y € R| © Mox2(R). 
A basis for W is 
pf (19) (01) (00 
~|LOLT? VOL? ALIS I 
3 4 


—-16 
R°?. Let vj, v2, and v3 be the above basis vectors, respectively. Then we determine whether there are 
scalars a1, a2, a3 so that w = ayv, + a2v2 + A303. That is, we determine whether there are scalars 


so that 
9 AV 2g, (OV na 1) aa 
ate ai) oa) ra 


Equating corresponding entries in the matrices on the left and right leads to the system of equations 


Now consider the vector w = ( ) € M2x2. If w € W, we can find the coordinate vector, [w]p € 
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3=a, 
4=a2 
-l=a3 


6=a,+a2+Q3. 


Thus, we find that aj = 3, a2 = 4, a3 = —1 telling us that w © W and has coordinate vector 
3 

[w]p=]| 4 
—1 


Now, consider the vector w = 6 _) € M 2,2(R). Following the same procedure as above, we can 


determine whether or not [w]g exists. That is, we determine whether there are scalars so that 


33\ 4 (1) na f0l\ am, (0% 
ga)" Gas oa) 1 1) 


The corresponding system of equations is 


3=ay (Upper left entry) 
3=a2 (Upper right entry) 
4=a43 (Lower left entry) 
4=a,;+a.+a3 (Lower right entry). 


Clearly, the system above has no solution. Therefore, w ¢ W and [w]p does not exist. 


In the previous example, we were able to answer two questions with one set of steps. That is, we can 
determine whether a vector space contains a particular vector while computing the coordinate vector 
at the same time. 


3.5.4 Brain Scan Images and Coordinates 


One of our larger goals is to understand and determine how density distributions of the human head 
(brain scan images) produce a given set of radiographs. Thus far, we have focused on understanding 
both brain images and radiographic images as mathematical objects. We have arrived at an important 
place in which these objects can not only be arithmetically manipulated, but can be categorized and 
cataloged, as well. Consider the following stepping stones. 


1. Vector Space Arithmetic. We can add (and subtract) images and multiply (scale) images by scalars. 
2. Vector Space Closure. All possible images are contained in a vector space. Arithmetic operations 
on images lead only to images of the same class. 

. Subspace. Some subclasses of images retain all the properties of the larger vector space. 

4. Linear Combination. Images can be expressed as simple weighted sums of other images. There 
are relationships among elements of sets of images. 

5. Spanning Set. A small (usually finite) set of images can be used to characterize larger (possibly 
infinite) subspaces of images through linear combinations. 

6. Linear Independence. Minimal spanning sets have the property of linear independence in which 
no image can be written as a linear combination of the others. 


Ow 
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7. Basis. Minimal spanning sets of images comprise a most efficient catalog set for a vector space. 
Each image in the space is represented uniquely as a linear combination of basis images. 

8. Coordinates. Using a given set of basis images, every image in a space is uniquely paired with a 
coordinate vector in R”. 


This last item, coordinates, is a major breakthrough. Now, and only now, can we represent arbitrary 
abstract vectors (images, matrices, functions, polynomials, heat states, etc.) from finite-dimensional 
vector spaces as vectors in R*. Given an ordered basis for a (possibly very abstract) vector space, we 
can perform all mathematical operations in the equivalent (and much more familiar) coordinate space 
(see Exercise 8). 


‘= Path to New Applications 

When looking for a function that predicts results from collected data, researchers will use 
regression analysis. This method requires you to find a coordinate vector, with respect to chosen 
basis functions, that minimizes an error of fit. See Section 8.3.2 for more information about 
connections between linear algebra tools and regression analysis. 


3.5.5 Exercises 


1. Using the vector space V and basis 6 in Example 3.5.11, determine if the given vector is in V, 
and if so, find the coordinate vector. 


(a) [5x2 —7x + 3],. 
(b) [x2 + 2],. 
(c) [x? + 4x + 2],. 
2. Let X = {(x, y,x — y,x + y)|x, y € R}. Finda basis, G for X. Determine if v = (2,3, 1,5) € X. 
If so, find [v]g. 
. Complete Example 3.5.6. 
4. Given vector space V with basis 6 = {2x + 1, x2, 2x3}, find w when 


io) 


2 
(a) [w]p =] 1 
3 
-1 
(b) [w]lp =| 0 
3 


: 12 01 00 
5. Given B = {(; ae i) , ({ 1) f ana v = Span B, 


(a) Verify that G is indeed a basis for V. 


(b) Find the element v € V so that [v]g = | 0 
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(c) Find the element v € V so that [v]g 


- OO 


. Given V = Span{vj1, v2, v3, v4}. Suppose also that B; = {v1, v2, v3, v4} is linearly independent. 


Let u = vy + 2v2 — 3v3 — v4. 


(a) Find [v]p,. 
(b) Show that By = {v1 — v2, vj + v2 + 33 — v4, V2, U3}, is a basis for V. 
(c) Find [v],,. 


. Use the basis 62 from Example 3.5.8 to find the following coordinate vectors. 


Bo 


(a) 


(b) 


(c) 


. Let v and w be vectors in finite-dimensional vector space V, a be a scalar, and B a basis for V. 


Prove the following. 


(a) lav]g = alv]z. 

(b) [v+ w]g = [v]g + [v]z. 

(c) The additive inverse of the coordinate vector for v is the coordinate vector of the additive 
inverse of v. 

(d) The coordinate vector for the additive identity vector is independent of choice of basis. 


. The standard ordered basis for the set of all 108 x 108 grayscale images is defined analogously 


to the standard ordered basis of Example 3.5.7. This basis is simple, easy to enumerate, and lends 
itself to straightforward coordinate vector decomposition. Suggest a different, potentially more 
useful, basis in the context of medical diagnosis of brain images. 

Dava is training for a sprint triathlon consisting of a 3 mile run, followed by a 20 mile bike ride, 
and ending with a half mile swim. Recently, she has completed three training sessions with total 
times shown in the table below. 


run (mi) bike (mi) swim (mi) |time (min) 


2 2 Ys 35 
0 10 1 60 
5 0 1 65 
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11. 
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Calculate Dava’s expected triathlon time using two different methods. 


(a) Compute the rates at which Dava can expect to perform in each event (minutes per mile) 
assuming that all times scale linearly with distance. Then determine her total expected time. 

(b) Consider Dava’s training run distances as vectors in R3. Show that this set of vectors is linearly 
independent. Using these vectors as a basis, find the linear combination of training runs that 
results in a vector of sprint triathlon distances. Then compute her expected time. 


(Data Classification.) Suppose that B = {b,,..., b,} is a basis for R”. One can show, with appro- 
priate definitions, that the (n — 1)-dimensional subspace S = Span{by, ..., bn—1} (a hyperplane) 
separates IR” into two regions. Informally, when we say that a subspace separates IR” into two 
regions, we mean that any point not in this subspace lies on one side of the subspace or the other. 
Given two vectors v and w in R” \ S, find a method, using coordinates, to determine if v and w 
lie on the same side of the hyperplane. Explain. 


‘= Path to New Applications 

These ideas are useful in data classification. If we want to group data (vectors) into categories, 
we can look for subspaces that separate the data points that have different characteristics. (See 
Exercise 30 of Section 7.1.) For more information about data classification using linear algebra 


tools, see Section 8.3.2 


® 


Check for 
updates 


Linear Transformations 


In Chapter 1, we introduced several applications. In Chapter 2, we considered vector spaces of objects 
to be radiographed and their respective radiographs, vector spaces of images of a particular size and 
vector spaces of heat states. In these particular applications, we see that vectors from one vector space 
are transformed into vectors in the same or another vector spaces. For example, the heat along a rod is 
measured as a heat state and the process of the rod cooling takes one heat state to another. For the brain 
scan radiography application, object vectors are transformed into radiograph vectors as in Figure 4.1. 

In this chapter, we will discuss the functions that transform vector spaces to vector spaces. We 
call such functions, transformations. We then consider the key property of transformations that is 
necessary so that we can apply linear algebra techniques. We will explore, in more detail, how heat states 
transform through heat diffusion. Using the result for heat state transformations and our knowledge 
of coordinate spaces we find a way to represent these transformations using matrices. We categorize 
linear transformations and their matrix representations by a number called the determinant. Finally, 
we explore ideal properties for the reconstruction of brain images from radiographs. We then explore 
these properties to learn more information about the vector spaces being transformed. 


4.1 Explorations: Computing Radiographs and the Radiographic 
Transformation 


In preceding chapters, we have seen how to model images as vectors (and the space of images as a 
vector space). We now focus on modeling the radiographic process that starts with a brain and produces 


an image of the brain. 
ee vr 


Fig.4.1 In brain scan radiography, objects are transformed into radiographs. 
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x-ray Incident Region of Interest Transmitted Detector 
Source x-ray beam x-ray beam 


Fig. 4.2 A schematic of the radiographic process. The function E measures the intensity of the X-ray beam. Here, the 
X-rays are traveling in the x direction. 


We will need to make some further simplifying assumptions to model this process with linear 
algebra. We begin by describing some basic principles of the process and how they lead to a linear 
algebraic model of the radiographic process. Then, we explore how the radiographs on small objects 
are computed. We discuss the notation and setup needed to do these computations as well. 

The material in this section is a no-frills sketch of the basic mathematical framework for radiography, 
and only lightly touches on physics concepts, since a deep understanding of physics principles is not 
necessary for understanding radiography. We refer the reader to Appendix A for more details on the 
process and the physics behind it. 


4.1.1 Radiography on Slices 


In transmission radiography such as that used in CAT scans, changes in an X-ray or other high-energy 
particle beam are measured and recorded after passing through an object of interest, such as a brain 
(Figure 4.2). 

To further simplify the setup, we consider the problem one layer at a time. That is, we fix a height, 
and consider the slice of the object and the slice of detector at that height. In what follows we model the 
radiographic process restricted to a single height. At a given height, the slice of brain is 2-dimensional 
(2D), and the radiographic image of the slice is a 1-dimensional (1D) image. 

To get the full picture, we “paste together” the lower dimensional objects and images; this models 
the radiographic process that takes a 3-dimensional (3D) brain and transforms it into a 2D image. 

For our basic setup, the detector is divided into some fixed number (mm) of “bins” (numbered, say, 
from | to m). For the kth bin, we denote the initial number of photons sent by PP and the total detected 
(after passing through the object) by p;. Then it turns out that 

Pk = pee %k/@, 
where sx is the total mass in the path of the kth bin portion of the beam, and @ is a constant proportional 
to the bin area. 
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Region of Interest Detector 


beam path k 


Voxel j 


Sk = 3 Taj 
aa 


Total Mass Fraction of voxel j Total mass 


along in beam path k in voxel j 
beam path k 


Fig.4.3 Object space and radiograph space discretization. 


We consider (a slice of) the region of interest to be subdivided into N cubic voxels (3D pixels). Let 
x; be the mass in object voxel j and 7;; the fraction of voxel j in beam path k (see Figure 4.3). (Note 
that x; is *not* related to the direction that the X-ray beams are traveling.) Then the mass along beam 
path k is 


N 
Sk = > Tj Xj; 
j=l 


and the expected photon count p,; at radiograph pixel k is given by 


pk = phew# 11 Mis, 
or equivalently, we define b;, as 
Pk . 
by = | -—a ln my — y Tj Xj. 
k 


j=l 


What we have done here is replace photon counts p; with the quantities b;. This amounts to a variable 
change that allows us to formulate a matrix expression for the radiographic transformation 


b=Tx. 


So what have we accomplished? We are modeling slices of objects x as vectors in a vector space, in 
much the same way that we modeled images before. As before, we model slices of radiographic images 
b as vectors in another vector space. And, the calculations above produce a mathematical model of 
the radiographic transformation that is given by matrix multiplication. Moreover, the matrix does not 
depend on the specific vectors x or b, it only depends on the way the object and detectors are arranged. 
This means that we can determine this matrix before we ever produce a radiograph. 

In the next section, we delve into the process of producing this matrix in a little more detail. 
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Fig. 4.4 Left: The geometry of a single-view radiographic transformation. Right: The geometry of a multiple-view 
radiographic transformation showing view 1, view 2, and view a (for some integer a). 


4.1.2. Radiographic Scenarios and Notation 


Keeping in mind that we are working with 2D slices of the object/region of interest. For example, a 
single-view radiographic setup consists of a 2D area of interest where the object will be placed, and a 
1D screen onto which the radiograph will be recorded. A multiple-view radiographic setup consists of 
a single 2D area of interest experimentally designed so that radiographs of this area can be recorded 
for different locations about the object. 

The geometry of a radiographic scenario is illustrated in Figure 4.4. Again, this is just for a single 
slice. The notation we use is as follows. 


>» Radiographic Scenario Notation 


Slice for region of interest: n by n array of voxels, where n is an even integer. 
Total number of voxels in each image slice is N = n?. 

Each voxel has a width and height of 1 unit. 

For each radiographic view, we record m pixels of data, where m is an even integer. The center 
of each radiographic view “lines up with” the center of the object. 

The width of each pixel is Scale Fac. If Scale Fac = 1, then pixel width is the same as voxel 
width. 

Number of radiographic angles (views): a. 

e Total number of pixels in the radiograph image: M = am 

Angle of the i” view (the angle of the line, connecting centers of the object and the radiograph, 
measured in degrees east of south): 6;. 

e Object mass at voxel j is x;. 

e Recorded radiograph value at pixel k is by. 


In this exploration, we will be constructing matrix representations of the radiographic process for 
several different scenarios. 
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We consider object coordinate vectors [x], and radiographic data coordinate vectors [b]8,, both 
using standard bases. Thus, objects are represented as vectors in R% and radiographs as vectors in 
R™. What will be the size of the corresponding matrix that transforms an object vector [x] Bo to its 
corresponding radiograph vector [b]g, through multiplication by the matrix? We call this matrix a 
matrix operator because the operation on objects is to multiply by the matrix. Read again the definition 
of the matrix operator from Section 4.1.1 (or Appendix A). Recalling the key point that particular 
values for [x] and [b] are not necessary for computing the matrix of the radiographic operator. 


4.1.3 A First Example 


Consider the setup pictured below. For this scenario, we have 


Total number of voxels: N = 4 (n = 2). 
Total number of pixels: M = m = 2. 
ScaleFac = V2. 

Number of views: a = 1. 

Angle of the single view: 6; = 45°. 


Recalling that 7;; is the fraction of voxel j which projects perpendicularly onto pixel k, the matrix 
associated with this radiographic setup is 


Be sure and check this to see if you agree. Hence, for any input vector [x], the radiographic output is 
[b] = T[x]. Find the output when the object is the vector 


For this simple example, it was easy to produce the matrix T “by hand.” But in general, we will be 
radiographing much larger objects. Code that automates this process is in the MATLAB/OCTAVE file 
tomomap.m. 
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Fig.4.5 Sketch of a radiographic scenario with three views at 6; = 315°, 62 = 45°, and 63 = 135°. 


4.1.4 Radiographic Setup Example 


To illustrate how we identify the angles in a radiographic setup, consider the radiographic scenario 
below. 


e Total number of voxels: N = 4 (n = 2). 

Number of pixels per radigraphic view: m = 2. 

Number of views: a = 3. 

Total number of pixels: M = am = 6. 

Stretch factor for radiographic pixels: Scale Fac = 1. 
Angle of the views: 6; = 315°, 02 = 45°, and 63 = 135°. 


A sketch for this setup is found in Figure 4.5. 
Using tomomap.m, we type 


T = full (tomomap (2,2, [315,45,135],1)) 


With OCTAVE/MATLAB output: 


T = 
0.50000 0.00000 0.82843 0.50000 
0.50000 0.82843 0.00000 0.50000 
0.50000 0.82843 0.00000 0.50000 
0.50000 0.00000 0.82843 0.50000 
0.00000 0.50000 0.50000 0.82843 
0.82843 0.50000 0.50000 0.00000 


The reader should verify that this matrix represents the expected transformation. 


4.1.5 Exercises 


Some of the following exercises will ask you to use MATLAB or OCTAVE to compute radiographic 
transformations. You will need to first download the function tomomap.m from the IMAGEMath.org 
website. If you do not yet have access to personal or department computing resources, you can complete 
this assignment using OCTAVE-Online in a web browser. 
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Standard Bases for Radiographic Data 


1. Consider the radiographic matrix operator T described in Appendix A and the example of 
Section 4.1.4. Suppose Bg is the standard ordered basis for the object space and Be is the standard 
ordered basis for the radiograph space. The order for these bases is consistent with the description 
in this section. 


(a) Sketch and describe basis elements in Bg. 
(b) Sketch and describe basis elements in Br. 


With an ordered basis we are now able to represent both objects and radiographs as coordinate vectors 
in R% and R™, respectively. This will make notation much simpler in future exercises. 


Constructing Radiographic Transformation Matrices 
2. Suppose you have the setup where 


e Height and width of the image in voxels: n = 2 (Total voxels N = 4) 
e Pixels per view in the radiograph: m = 2 

e ScaleFac =1 

e Number of views: a = 2 

e Angle of the views: 0; = 0°, 62 = 90° 


(a) Sketch this setup. 
(b) Calculate the matrix associated with the setup. 
(c) Find the radiographs of the two objects below. 


(d) Represent the radiographs in the coordinate space you described in Exercise | above. 
3. Suppose you have the setup where 


e Height and width of the image in voxels: n = 2 (Total voxels N = 4) 
e Pixels per view in the radiograph: m = 2 

e ScaleFac=1 

e Number of views: a = 2 

e Angle of the views: 6; = 0°, 02 = 45° 


(a) Sketch this setup. 
(b) Calculate the matrix associated with the setup. 
(c) Repeat step (b) using the code tomomap. 


4. Suppose you have the setup where 


e Height and width of the image in voxels: n = 2 (Total voxels N = 4) 
e Pixels per view in the radiograph: m = 2 

e ScaleFac = J/2 

e Number of views: a = 2 

e Angle of the views: 6; = 45°, 62 = 135° 
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(a) Sketch this setup. 
(b) Calculate the matrix associated with the setup. 
(c) Repeat step (b) using the code tomomap. 


5. Suppose you have the setup where 


e Height and width of the image in voxels: n = 2 (Total voxels N = 4) 
e Pixels per view in the radiograph: m = 4 

e ScaleFac = 2/2 

e Number of views: a = | 

e Angle of the views: 6; = 45° 


(a) Sketch this setup. 
(b) Calculate the matrix associated with the setup. 
(c) Repeat step (b) using the code tomomap. 


6. Suppose you have the setup where 


e Height and width of the image in voxels: n = 4 (Total voxels N = 16) 
e Pixels per view in the radiograph: m = 2 

e ScaleFac = 1 

e Number of views: a = 2 

e Angle of the views: 0; = 0°, 62 = 45° 


(a) Sketch this setup. 

(b) Calculate the matrix associated with the setup. 

(c) Find the radiographs of images A, B, and C from Section 2.1 under this transformation. 
(d) Repeat steps (b) and (c) using the code tomomap. 


7. Suppose you have the setup where 


e Height and width of the image in voxels: n = 4 (Total voxels N = 16) 
e Pixels per view in the radiograph: m = 4 

e ScaleFac = 1 

e Number of views: a = 3 

e Angle of the views: 6; = 0°, 62 = 25.5°, and 63 = 90° 


(a) Sketch this setup. 
(b) Calculate the matrix associated with the setup using tomomap. 
(c) Find the radiographs of images A, B, and C from section 2.1 under this transformation. 


8. A block matrix is a matrix of matrices. In other words, it is a large matrix that has been partitioned 
into sub-matrices. We usually represent the block matrix by drawing vertical and horizontal lines 
between the blocks. For example, the matrix 


1010 
A=(o103) 


can be considered as the block matrix 
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(a) Choose one of the two-view radiographic setups from Exercises 2, 3, 5, or 6. Find the two 
matrices associated with each of the component single-view radiographic transformations. 
Compare these two matrices with the overall radiographic transformation matrix. What do 
you notice? 

(b) Repeat this for the radiographic setup from Exercise 7. 

(c) In general, the transformation associated with a multiple-view radiograph can be represented 
as a block matrix in a natural way, where the blocks represent the transformations associated 
with the component single-view radiographs. Suppose that you know that for a particular 
radiographic setup with k views, the individual views are represented by the matrices 
T|,..., 7x. What is the block matrix that represents the overall transformation T? 


Radiographs of Linear Combinations of Objects. 
Take the two objects in Exercise 2 to be x (left object) and y (right object). For each of the 
transformations in Exercises 2, 4, and 5, answer the following questions. 


9. Determine the radiographs of the following objects. 


(a) 3x 
(b) 0.5y 
(c) 3x + 0.5y 


10. Generalize these observations to arbitrary linear combinations of object vectors. Write your 
conjecture(s) in careful mathematical notation. 


4.2 Transformations 


In Section 4.1, we became more familiar with the radiographic process. We identified two important 
features of this process. First, the process takes vectors in an object space to vectors in a radiograph 
space. Second, if we compute the radiograph of a linear combination of objects, this radiograph is 
a linear combination of the radiographs corresponding to those objects. That is, if x;, x2 are objects 
corresponding to radiographs b, and bz, respectively, then the radiograph corresponding to ax; + x2, 
for some scalar a, is wb; + b2. At first, this property may not seem very useful or informative. However, 
suppose we have detailed knowledge about part of the object that is being radiographed. Suppose Xexp 
is the expected part of the object and we wish to discover how different the actual object, x, is from 
what is expected. One possible scenario is illustrated in Figure 4.6 where a known (or expected) 
object (at left) is similar to the actual (unknown) object (at right). The circle represents an unknown 
or unexpected part of the true object. We find the unexpected object as part of the difference object 
Xdiff = X — Xexp. Because we know the radiographic transformation, T, we can produce an expected 
radiograph bexp = T Xexp. We then attempt to recover the radiograph that corresponds to the difference 
object. That is: 


b=Tx= T (Xexp + Xaift) = Dexp + baitt, 


so that the difference object is one that produces the difference radiograph T xqie¢ = baie. While this 
problem is mathematically equivalent to solving Tx = b, the significant knowledge of xexp can help 
make the problem more tractable. For example, the difference image may be in a subspace of relatively 
small dimension, or the process of finding xgifs may be less prone to uncertainties. 

A similar scenario arises when comparing radiographs of the “same” object taken at different times. 
Suppose we radiograph the object, xexp as in Figure 4.6 (on the left) and find that the radiograph is 
bexp. But weeks later, we radiograph the same object (or so we think) and we get a radiograph that is 
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Fig.4.6 Two possible objects. Left: An expected scenario Xexp. Right: The actual scenario x. 


—’ 


Fig. 4.7 Two objects with more mass than might be expected, indicating more density within the object being 
radiographed. 


1.3 times the radiograph bexp. This could mean that the object now looks more like one of the objects 
we see in Figure 4.7. We notice that the object density is proportionally larger, possibly meaning the 
object became more dense (as indicated by a darker color on the left in Figure 4.7) or another object is 
present along with the object we expected (represented by the figure on the right). Again, the change 
in the object is given by xairr. 

In this section, we will explore the properties of transformations such as radiographic transformations 
that possess the property that linear combinations of inputs result in the same linear combination of 
outputs. We call such transformations linear. 

In order to avoid cumbersome notation, we will write vector spaces (V, +, -) as V, leaving to the 
reader the responsibility of understanding that each vector space is defined with specific operations of 
vector addition and scalar multiplication. Each vector space is also defined over a field, of which the 
reader must also infer from the context. For example, the declarative 


“Let (V, +, -) and (W, ®, ©) be vector spaces over the field F and let a, b € F.” 


will be shortened to 
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“Let V and W be vector spaces and let a and b be scalars.” 


without loss of meaning. 


4.2.1. Transformations are Functions 


Simply stated, a transformation is a function whose domain and codomain are vector spaces. The point 
is that the word transformation is not referring to a new concept, rather a very familiar concept with 
the added context of vector spaces. 


Definition 4.2.1 


A function T: V — W is called a transformation if the domain, V, and the codomain, W, are 
vector spaces. 


Not all functions are transformations. Consider the function whose input is the date and time and 
whose output is the shirt color of the person closest to you. This function is only defined for dates 
and times for which you were living—this domain is not a vector space. The function outputs could 
be considered RGB values in R? (a vector space). However, the closest person may not be wearing a 
shirt, in which case an acceptable function output is “no color” which is not an element of R?. 


Example 4.2.2 Let A be the set of all possible angles measured counterclockwise from a particular 
ray in R?. The set A is a vector space. Let T : A > R be the transformation defined by T (a) = cos(a). 
Notice that T (a) is defined for all a in the domain A. We also know that —1 < cos(a) < 1 foralla € A. 
But, we still write that T : A > R because [—1, 1] C R. In this case, T transforms vectors from the 
space of angles into vectors in the space of real numbers. 


Definition 4.2.3 


The range of a function (and therefore any transformation) T: V — W is the set of all codomain 
vectors y for which there exists a domain vector x such that T(x) = y. 


Definition 4.2.3 is equivalently written: 
Let T : V — W. The range of T is the set {y € W | T(x) = y for some x € V}. 


It is not necessary for a transformation to have the potential to output all possible vectors in the 
codomain. The transformation outputs need only be vectors in the codomain. In Example 4.2.2, the 
transformation has codomain R and range [—1, 1]. This particular example tells us that the range of a 
transformation need not be a vector space. 


4.2.2 Linear Transformations 


As we discussed in the introduction to this section, some transformations have the interesting and 
useful property of preserving linear combinations. That is, transforming a linear combination of 
domain vectors is equivalent to first transforming the individual domain vectors and then computing 
the appropriate linear combination in the codomain. 
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Definition 4.2.4 


Let V and W be vector spaces. We say that T : V > W is a linear transformation if for every 
pair of vectors v1, v2 € V and every scalar a, 


T(@- v1 + v2) =a-T(v1) + T(0). (4.1) 


Equation 4.1 is called the linearity condition for transformations. Not all transformations are linear, 
though many familiar examples are. In fact, the example from Section 4.1 led us to the discussion of 
linear transformations and is, therefore, our first example. 


Example 4.2.5 Let Vo be the vector space of objects and Vr be the space of radiographs. In Section 4.1, 
we found that T : Vo — Vp is a transformation so that if we have two objects x1, x2 € Vo whose 
corresponding radiographs are T(x;) = b; and T (x2) = bo, then for scalar a, T (ax; + x2) = ab, + 
bz. Thus, radiographic transformations are examples of linear transformations. 


We now consider a few examples to help understand how we determine whether or not a 
transformation is linear. 


Example 4.2.6 Consider the vector space F ={f:R— R | f is continuous} defined with the 
standard operations of addition and scalar multiplication on functions (see Section 2.3). Consider 
fixed scalar a and define T, : F — R to be the transformation defined by T,(f) = f(a). Ta is a linear 
transformation. 


Proof. Let a € R and define T, as above. We will show that 7, satisfies the linearity condition by 
considering arbitrary functions f, g € F and an arbitrary scalar a € R. Then 


Ta(af + g) =(a@f + g)(a) 
=af (a) + g(a) 
=aT,(f) + Ty (g). 


Therefore, by Definition 4.2.4, T, is linear. 


Example 4.2.7 Consider the transformation T : M2,.3(R) > IR? defined as follows: 


7 (4 be\  [(atbtec 
dfg) \d+fte)- 
We can show that 7 is a linear transformation, again by testing the linearity condition. Consider 
arbitrary vectors vj, v2 € M2 x3(R) and an arbitrary scalar a. We will show that 


T (avy + v2) = aT (vj) + T(r). 


‘ =.) re aoe 
Ndi fi gi Nb he 


for some scalars a1, bj, c1, d1, fi, 21, 42, 02, c2, do, fo, g2 € R. Then we know that 


Let 
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adj+d2 afi t+ fo agi +g2 


_ (aay + a2 +aby + by + 0c, + c2 
ad; +d.+af, + fo+agit+ g2 


_ be + by) +e) + — 
~ Na(d + fit gidt+ st fo + g2) 


_ ee +byt+ 7) (3 +bo+ =) 
~ \a(di + fi + 81) (dz + fo + g2) 


te oe) 
aq+fitgi do + fr2+ 92 


=aT (v1) + T(v2). 


T (av, + v2) =T tees i ti as aa _) 


Therefore, T maps linear combinations to linear combinations with the same scalars. So, T is a linear 
transformation. 


Because linear transformations preserve linear combinations, we can relate this idea back to matrix 
multiplication examples from Section 3.1. 


Example 4.2.8 Let M be the matrix defined by 


Define the transformation T : R*? — R? by 


Recall that multiplying by M will result in a linear combination of the columns of M. Therefore, we 


can rewrite T as 
a 
2 3 1 
T J = a(t) +0(3) +e((). 


This form of T inspires us to believe that it is a linear transformation. Indeed, let x, y € R2. Then, 


a d 
x= |b] andy=Je], 
c f 


for some scalars a, b,c, d,e, f € R. Therefore, if a is also a scalar, then 
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T(ax + y) =M(ax + y) 


aat+d 
=M|ab+e 


ac+ f 


=(wa +a) (7) +ab+e(3) + (e+ 17) 
St ee 
= (we) + (aan) * (20) + (3) + (G2) +) 
(li) > +(eO)--Qor0) 


Thus, 7 satisfies the linearity condition and is, therefore, a linear transformation. 


a 
=aM|b|]|4+M 


=oT (x) + T(y). 


Example 4.2.9 In a first class in algebra, we learn that f : R > R, defined by f(x) = mx + b for 
m, b € R, is called a “linear function.” If f is a linear transformation, then it should satisfy the linearity 
condition. 

Let x, y € R and let a be a scalar. Then 


af(x) + f(y) =a(mx +b)+my +b. 
=m(ax+y)+b+ab 
= f(iax+y)+ab. 


Notice that af(x) + f(y) = f(ax + y) only when b=0. Thus, in general, f is not a linear 
transformation. The reason that f is called a linear function is because the graph of / is a line. 
However, f does not satisfy the definition of linear, and we do not consider this function to be linear. 
Mathematicians call functions of the form f(x) = mx + b affine functions. 


Definition 4.2.10 


Let V and W be vector spaces. Transformation T: V — W is an affine transformation if there 
exists constant vector b € W such that T(x) = T(x) — bis a linear transformation. 


Example 4.2.11 Consider the vector space V (from Example 3.4.29) of color images that can be 
created by a 12 megapixel phone camera. Suppose you have the image on the left in Figure 4.8 and you 
want to lighten it to show more details of the cat you photographed (such as the image on the right.) You 
can apply a transformation T : V — V to the left image, Jgark. When adjusting the brightness of such 
an image, we add more white to the whole image (so as not to change contrast). That is, we add a flat 
image (all pixel intensities are equal). The transformation T : V — V used to brighten an image J € V 
is given by T(J) = J + a1, where 1 is the image of constant value 1 and a is a scalar brightening factor. 
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Fig.4.8 Left: A 2448 x 3264 (approximately 8 megapixel) color image taken by a 12 megapixel phone camera. Right: 
Same image lightened by adding a flat image. 


Fig.4.9 A better lightening of the left image in Figure 4.8. 


T is not a linear transformation. Indeed, if ; and J) are images in V, then TU) + tb) = +h+al, 
but T(,) + Tih) =Nht+al+ht+al=h+h+2al. 

Notice, on the right in Figure 4.8, that such a transformation leaves us with a poorly lightened image. 
Another transformation that can be performed on Jgark, to allow for better contrast, is to apply anonlinear 
transformation such as 7) : V — V defined pixelwise by 72((pi,j,«)) = (255 * \/ pi, j,k/255), where 
Taark = (Pi,j,k) for 1 <i < 2448, 1 < j < 3264, and 1 <k < 3). The resulting image is found in 
Figure 4.9. T> is not a linear transformation and this is easily seen by the square root, since a +b # 


Jat Jb. 


The domain and codomain of linear (and nonlinear) transformations are often the same vector space. 
We see this in the image brightening example. Consider two more examples. 


Example 4.2.12 Let us consider the vector space D(Z2) of 7-bar LCD images given in Example 2.4.17. 
Consider, also, the transformation T : D(Z2) — D(Z>2) defined by adding a vector to itself. That is, 
for x € D(Z2), T(x) = x +x. Notice that T maps every vector to 0, the zero vector. We can see this 
by considering the vector space properties of D(Z2) and the scalar field properties of Z2: 


Tix)=x4+x=1-x41-x=04+1)-x=0-x=0. 
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Some specific examples are: 


<= = = 
T| —- ==> | =» , 
Lt) wi 
==> c—" el 
ee ee 
_) —! i 


Let x, y € D(Z2) and let a € Zo, then 


Ta-x+y)=(a-x+y)+(a-x+y) 
=a-x+a-x+yt+y 
=a-(x+x)+(yt+y) 
=aT(x) + Ty). 


Because the linearity condition holds for arbitrary domain vectors and arbitrary scalars, T is a linear 
transformation. 


In Example 4.2.12, we introduced the zero transformation, the transformation T : V — W that 
maps all vectors in V to 0 € W (the additive identity vector in W). 


Definition 4.2.13 


Let V and W be vector spaces and T : V + W. We say that T is the zero transformation if 
T(v) =0€ W forall ve V. 


Example 4.2.14 Consider the vector space R? viewed as a 2D coordinate grid with x (horizontal) and 
y (vertical) axes. All vectors in R? can then be viewed using arrows that originate at the origin and 
terminate at a point indicated by the coordinates of the vector. Using this picture of R?, we consider 
the transformation T : R? > R? that is a combination of three other transformations: 7; : R? > R? 
that first rotates the vector 90° counterclockwise about the origin, next T> : R2 — R? reflects it across 
the line y = x, and finally, 73 : IR? — R? reflects it over the x-axis. That is T(x) = T3(To(T, (x))) for 


all x € R?. For example, 
1 nd =? i 1 = 1 
2 1 —2 2) 


(Here, we use the notation “a ne b” to indicate that f(a) = b. We read it as “a maps to b.”) This 
example suggests that this transformation maps a vector onto itself. We can show that this is, indeed, 


x 
true. Letx = (~! , then we have 
x2 
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=X. 


This transformation is also linear. Letx, y € R? andleta € R. ThenT (ax + y) =ax +y =aT(x)+ 
T (y) and the linearity condition holds. 


Definition 4.2.15 


Let V be a vector space and T : V + V. We say that T is the identity transformation if T(v) = v 
forallue V. 


Another very useful transformation is one that transforms a vector space into the corresponding 
coordinate space. We know that, given a basis for n-dimensional vector space V, we are able to represent 
any vector v € V as acoordinate vector in the vector space R”. Suppose B = {v1, v2, ..., vn} is a basis 
for V. Recall that we find the coordinate vector [v] by finding the scalars, a1, @2,..., @,, that make 
the linear combination v = a} v1 + Q2v2 +... + QyUy, giving us 


an 


The transformation that maps a vector to its coordinate vector is called the coordinate transformation. 
Our next theorem shows that the coordinate transformation is linear. 


Theorem 4.2.16 
Let V be an n-dimensional vector space and 6 an ordered basis for V. Then T : V > R” defined 
by T(v) = [v]g is a linear transformation. 


Proof. Suppose V is a vector space and B an ordered basis for V. Let u, v € V anda bea scalar. Then 
there are scalars a), d42,...,4n, D1, b2,..., by so that 
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Notice that 


Thus, the linearity condition holds and T is a linear transformation. 


*x* Watch Your Language! 


a by 
a bo 
lulp = and[v]g=] . 
an by 


T(au+v) = [au+v]g 


an bn 


= a[ulgp + [vl] 
=aT(u)+T(v). 


4 Linear Transformations 


It is important to notice the subtle difference between an everyday function and a transformation. A 
function need not have vector space domain and codomain, whereas a transformation requires both to 
be vector spaces. Linear transformations are particular transformations that preserve linear combinations. 


It is correct to say 


VY The range of a transformation is the set of all outputs. 
¥ Linear transformations satisfy the linearity property. 
¥ Linear transformations preserve linear combinations. 


But it would be incorrect to say 


X The codomain of a transformation is the set of all outputs. 


4.2.3 Properties of Linear Transformations 


The transformations 7;, 72, and 7; from Example 4.2.14 are all linear (the proofs are left to the reader). 
This leads us to question whether it is always true if we compose linear transformations, that we get a 
linear transformation. What about other typical function operations on linear transformations? Do they 
also lead to linear transformations? These questions are answered in this section. First, let us define 
some of the common operations on transformations. 


Definition 4.2.17 


Let U and V be vector spaces and let T; : V — U and I): V — U be transformations. We define 
the transformation sum, T; + 72: V — U by 
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(T) + T2)(v) = Ti(v) + Ta(v) 
for every v € V. We define the transformation difference, 7; — T, : V — U by 
(T, — T2)(v) = Ti(v) + (-1) - Th(v) 


for every v € V. 


Example 4.2.18 Consider the transformations 7; : R? —> R* and T): R* > R? given by 


x x-y x x 
‘G) ce) - (5) oe) 
me x—y x x-—ytx 2x —y 
— »(;) (ee) Pla.) oe) Gane 


As a particular realization, notice that 


n(:)-()- 2G)-G)- 2G)+*Q)-(): 


We have 


and, therefore, 


Definition 4.2.19 


Let U, V, and W be vector spaces and let T; : V — W and 7): W > U be transformations. We 
define the composition transformation 


hol): V—7-~U 


by (1h o T1)(v) = Th(7\(v)) for every v € V. 


Multiple sequential compositions of transformations, such as we saw in Example 4.2.14, are 
commonly written without extraneous parenthetical notation. In the example, we have T = 730 
(Ty o T;), but we write simply T = T3 0 Tp o Tj. 


Example 4.2.20 Consider the transformations T, : R? — R? given by 


. (") = es 
Ts (5) =a+b. 


and T) : R* — R given by 
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(To 0 T;) (;) =T> (n (*)) = T% 4 =3x+y-x=2x+y. 


Considering the examples of combinations of transformations that we have encountered, it is useful 
to know whether these new transformations preserve linearity. Is the sum of linear transformations also 
linear? Is the composition of linear transformations also linear? The answer is yes and is presented in 
the following theorem. 


Then 


Theorem 4.2.21 
Let U, V, and W be vector spaces and let 7; : V > W,7: V — W,and 73: W — U belinear 
transformations. Then the following transformations are linear: 


. Tl, +T2:V > W, 

. I, —T2:V > W, 

. aT, : V > W, for any scalar a, 
. F30T%: VU. 


BwWN eR 


The reader is asked to provide a proof in Exercise 39. 


In examples like Example 4.2.9, we might wonder why something whose graph is a line, is not 
called linear. Here, we explore what went wrong with affine functions. When looking at our work, we 
see that, in order for f to be linear, b and wb need to be equal for all scalars a. This only occurs when 
b = 0, that is when f(x) = mx. In that case, 


flax + y) =m(ax + y) 
=a(mx) +my 


=af(x) + f(y). 


That means, f : R — R defined by f(x) = mx is linear. This brings to light a very important property 
of linear transformations. 


Theorem 4.2.22 
Let V and W be vector spaces. If T : V > W is a linear transformation, then T(Oy) = Ow, 
where Oy and Ow are the zero vectors of V and W, respectively. 


Proof. Let V and W be vector spaces and let T : V ~ W bea linear transformation. We know that, 
for any scalar a, a@0y = Ov. So, 


T (Ov) = T(a0y) = aT (Oy). (4.2) 
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In particular, when aw = 0 we see that T(Ovy) = Ow. 


Theorem 4.2.22 gives us a quick check to see whether a transformation is not linear. If T(Oy) 4 Ow 
then T is not linear. The converse is not necessarily true. The fact that T(Oy) = Ow is not a sufficient 
test for linearity. Consider the following examples. 


Example 4.2.23 Consider again the vector space D(Z2). Define T : D(Z2) — Rtobe the transformation 
that counts the number of “lit” bars. For example, 


| tt 
E , =5andT 1 =7. 


Therefore, 0 € D(Z2) maps to 0 € R. But, T is not a linear transformation. 
We can see this by considering the following example. Notice that, 


S> i E> 
ah (~ + r = I = 5, 
=> => 
but 
<=> 


=54245. 


Thus, 7 is not linear. 


Example 4.2.24 Consider again the image brightening Example 4.2.11. The transformation is T (J) = 
I + a1. The zero image, represented as all black pixels, is transformed into a nonzero image of constant 
intensity a. Thus, for any aw 4 0, T is not linear. 


Linear transformations have the special property that they can be defined in terms of their action on 
a basis for the domain. First, we define what it means for two transformations to be equal, then show 
two important results. 


196 4 Linear Transformations 


Definition 4.2.25 


Let V and W be vector spaces. We say that two transformations T; : V — W and 7): V — W are 
equal, and write Ty = T2, if T(x) = 72(x) forall x € V. 


We have seen an example of two equal transformations. Recall that the transformation in 
Example 4.2.14 was written as the composition of three transformations and as the identity trans- 
formation. The next theorem begins the discussion about how linear transformations act on basis 
elements of the domain space. 


Theorem 4.2.26 
Let V and W be vector spaces and suppose {v1,v2,--- , Un} is a basis for V. Then for 
{w1, W2,--: , Wn} C W, there exists a unique linear transformation T : V — W such that 


TO, by ton =) Deena, 


> Note about the proof: When proving the existence of something in mathematics, we need to be 
able to find and name the object which we are proving exists. Watch as we do that very thing in the 
following proof. 


Proof. Suppose V and W are vector spaces and 6 = {v1, v2,--- , un} a basis for V. Let x € V. 
Then there exist unique scalars a1, a2, ..., a, such that x = a) v1 + a2v2 +--+ + dy Up. Also, suppose 
W1,W2,°-* , Wy, are arbitrary vectors in W. Define T : V — W by 


T(x) = a,w, tagw2 +--+ +ayWy. 


Now, since vg is uniquely written as a linear combination of v1, v2, ..., UV, using scalars 
0 ifi fk 
qi= neg 
1 ifi=k, 
we know that T (vz) = w, fork = 1,2,...,. We next show that T is linear and that T is unique. 
Linearity 


To show that T is linear, we consider y, z € V and scalar a, and show that 
T(ay +z) =aT(y)+ T(z). 


Because B is a basis of V, there exist scalars by and cx, fork = 1,2,...,n so that y = bj v, + bov2 + 
+++ + byjvy and z = Cc, vy + C22 +... + CyVy_. We have 


Tiay+z)=T (- S> deve + Son] 
=T (> (ab ug + a) 


k=1 
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=T (»: (aby + aon] 


k=1 


n 
= > (aby + cK) We 


k=1 
n n 
=a Sb we + S> cawe 
k=1 k=1 
=aT(y)+ T(z). 


Therefore, by Definition 4.2.4, T is a linear transformation. 


Uniqueness 

To show that T is unique we will show that, for any other linear transformation U : V — W with the 
property that U(vz) = wy fork = 1,2,...,n, it must be true that U = T. Let x € V, then there are 
unique scalars a}, d2,..., @, so that 


X= a,Vj + d2U2 +... + ann. 


Therefore, by linearity of U, 


n n 


U(x) =U (> aun] = UG) ae = TO, 
k=1 


k=1 k=1 


Thus, by definition of equal transformations, U = T. 


Theorem 4.2.26 tells us that, there is always one and only one linear transformation that maps basis 
elements, of an n-dimensional vector space, to m vectors of our choosing in another vector space. The 
following corollary makes clear an important result that follows directly. 


Corollary 4.2.27 

Let V and W be vector spaces and suppose {v1, v2,--- , Un} is a basis for V. If T: V > W 
and U : V > W are linear transformations such that T (vz) = U(v;z) fork = 1,2,--- ,n, then 
= Ul. 


Proof. Suppose V and W are vector spaces and {vj, v2,--- , Up} is a basis for V. Let T, U be 
linear transformations defined as above. Let wy = T(vux) for k = 1, 2,--- ,n with wz, € W. Then, 
by supposition, U (vg) = wx. So by Theorem 4.2.26, T = U. 


Corollary 4.2.27 shows us that the action of a transformation on a basis for the domain fully 
determines the action the transformation will have on any domain vector. There cannot be two different 
linear transformations that have the same effect on a basis for the domain. 

Consider the radiographic transformation of Section 4.1.4. The domain is the vector space of 2 x 2 
voxel objects and the codomain is the vector space of 2 pixel radiographs. The transformation is fully 
defined by its action on a basis for the domain. Suppose we consider the standard basis 
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Z 110) Jolo] }o/1] |ojo 
~)iofloVlisloPyololjolay 


The action of the transformation on these basis vectors yields four radiographs: 


0 1 5 
T(B) = 5 ’ 1 ’ @) ’ va . 


No other lJinear transformation transforms the vectors of B into the vectors in T(B). No other 
information about T is needed in order to determine its action on arbitrary vectors. As a specific 


example, suppose 
2/3 
t= ’ 
fare 


and we wish to find the radiograph b that results in transforming x under the transformation 7. Using 
the fact that T is linear, we have 


NI- 


0}. 0] 0 
+3. +1- 
0| 0 o}1 


0/0 0/1 0/0 
+3-T +12t 
1/0 0 | 0 Oj} 1 


+3: 0 +1- 


All we needed to know is how T acts on a basis for the object space as well as the coordinates of the 
given object in the same basis. The reader should compare this result carefully with the discussion in 
Section 4.1.4. These ideas will be explored more completely in Section 4.4. 


4.2.4 Exercises 


For Exercises | through 13, determine which of the given transformations are linear. For each, provide 
a proof or counterexample as appropriate. 


1. Define M to be the 3 x 2 matrix 


4.2 


io) 
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and define f : R? + R? by f(v) = Mv 


. Define 


and f 2? > R? by f(o) = M (3) + 


. Define M as in Exercise 2 and define T : R? > R? by T(v) = Mv. 
. Define f : R? + R? by f(v) = Mv + x, where 


121 1 
ICES, and *=(5) 


. Let T : Pp — R be defined by T(ax? + bx +c) =a+b+e. 
. Define F : V > P|, where 


V = {ax” + Ga — 2b)x +b | a,b € R} C Pro. 


by 
F (ax? + Ba — 2b)x +b) = 2ax + 3a — 2b. 


. Define G : Pz > M 2x2 by 


2 _ a a-—b 
al torta=(_7, 273). 


. Define hh: V > P}, where 


a boc 
a eer 


abceR| SC Max3 


by 
h ae =ax+ec 
Ob-—c2a}) 
Let 
-——_ 3a 
—b +—— _ 2a 
T= ¢Il=-——_ 0 a,b,cER 
c ~—_ 0 
—— 3c 


And define f : Z > P2 by 
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T=Of e i i i i e 
x=Ocm 2=5 em 


Fig.4.10 Example of averaging heat state transformation. 
fi) =ax* + @ +07 4+ G40. 


10. Define f : M2x2 > R* by 


SY 
a 
2 8 
Qa 
NS 
Nara 


11. Define f : P2 > R?* by 
flax? +bx +0 = (o*%). 


Cc 


12. Let 74 be the set of all possible heat states sampled every 1 cm along a 5 cm long rod. Define a 
function T : 714 — 74 by replacing each value (which does not correspond to an endpoint) with 
the average of its neighbors. The endpoint values are kept at 0. An example of T is shown in 
Figure 4.10. 

13. Define O: D(Z2) > Z2 by O(d) = 1 if d is an LCD digit with at least one lit bar, and O(d) = 0 
otherwise. 


For Exercises 14-27, you will explore the geometry of a linear transformation. Let L]; be a square 
that resides in the positive quadrant of R”, has sides aligned with the x- and y-axes, a vertex at the origin, 
and side length equal to 1. (It might be useful to know the result of Exercise 32: a linear transformation 
T maps line segments to line segments, so the image of a line segment whose endpoints are a and b is 
the line segment with endpoints T (a) and T (b).) 


14. Draw L) and write the vectors corresponding to each vertex. 
15. Consider the function S : R2 — R?, defined as 


x 20\ (x 
G)=) (0) 
Verify that S is a linear transformation. Use S to transform Ll). This means that you will find 


S(x, y) for each vertex on L];. What did S$ do to L,? 
16. Consider the function R; : R? > R?, defined as 


4.2 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


Transformations 


x 1 0 x 
R = ‘ 
; (‘) ({ =) (") 
Verify that R, is a linear transformation. Use R to transform 
Consider the function Ry : R* — R2?, defined as 


x 1 0 x 
R = : 
° (‘) ({ ‘) (") 
Verify that R2 is a linear transformation. Use R2 to transform 
Consider the function T : R? > R?, defined as 


()=(10)(): 


1. What does R do to 


Verify that T is a linear transformation. Use T to transform 
Consider the function T : R? > R2, defined as 


*()=G0)(): 


Verify that T is a linear transformation. Use T to transform 


Consider the function S; : R? > R?, defined as 


»(1)=(01)(@). 


1. What does R2 do to 


1? 


1. What does T do to LJ;? 


1. What does T do to 1? 


Verify that Sz is a linear transformation. Use S2 to transform 


1. What does S2 do to 


Consider the function $3 : R2 — R?, defined as 


»()=(3)(). 


Verify that S3 is a linear transformation. Use $3 to transform 
Consider the function Ry : R* — R2, defined as 


1. What does 53 do to 


C=) 0), 


Verify that R2 is a linear transformation. Use R2 to transform 
Consider the function R3 : R? — R?, defined as 


1. What does R2 do to 


— (‘) ~ i (, ) (;). 


Verify that R3 is a linear transformation. Use R3 to transform 
Consider the function R4 : R* — R2, defined as 


1. What does R3 do to 


1? 


1? 
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elt \a8e (HE TVG 
i y) 2 \-1-1) \y)° 
Verify that R4 is a linear transformation. Use R4 to transform LJ]. What does R4 do to L);? 
25. Describe, geometrically, the transformations S o T and T o S. What do they do to L];? 


26. Describe, geometrically, the transformations Ry o S and So Ry. What do they do to L;? 
27. Create transformations that will transform L]; in the following ways: 


(a) Stretch L; horizontally by a factor of 5 and vertically by a factor of 7. 
(b) Rotate LJ; counterclockwise by an angle of 30°. 

(c) Transform L]; to a vertical line segment of length 2. 

(d) Transform L]; to a horizontal line segment of length 3. 

(e) Reflect LJ; over the x—axis. 


For Exercises 28-31, define L(x9, yy) to be LI; (defined for Exercises 14-27) translated up a distance yo 
and right a distance xo. (If yo is negative, L]; is actually translated down. Similarly, if xo is negative, 
1 is translated left.) 


28. Draw d,1)» 4(- 1,1), and (—2,—2)- 

29. Describe what happens to each of H1,1), U—1,1), and L—2,-2) when you transform them by R4 
in Exercise 24. 

30. Describe what happens to each of H(1,1), U—1,1), and L(—2,-2) when you transform them by T 
in Exercise 18. 

31. Describe what happens to each of H1,1), H(—1,1), and L(—2,-2) when you transform them by S in 
Exercise 15. 

32. We can parameterize a line segment connecting two points in R* as follows 


(it) =Av+(—-A)u, 


where u and v are the vectors corresponding to the two points and 0 < A < 1. 


(a) Find the vectors corresponding to £(0) and to £(1). 
(b) Let T : R* — R? bea linear transformation. Show that 7 maps the points on the line segment 
to the points on another line segment in R?. 


33. Consider the vector space of functions 
D'(R) = {f :R— R| f is continuous and f’ is continuous}. 


Show that T : D!(IR) + F defined by T(f) = f’ is linear. Here, F is the vector space of functions 
given in Example 2.4.1. 
34. Consider the space of functions 


R(R) ={f :R— R| f is integrable on [a, b]}. 


Show that T : R(R) > R defined by T(f) = ibs J (x) dx is linear. 
35. Using Theorem 4.2.22, show that the function f: R — R, defined by f(x) = 3x — 7, is not linear. 
36. Consider, again, the vector space of 7-bar LCD images, D(Z2) from Example 2.4.17. 


(a) Show that if we have a transformation T : D(Z2) — V, where V is a vector space with the 
same scalar set Z2, then T is linear if T(x + y) = T(x) + T(y) for all x, y € D(Z2). 
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(b) Use Part (a) to show that if T : D(Z2) — D(Zz) is the transformation that flips the digits 
upside down, then T is linear. Some example transformations are as follows: 


=—4 == 

(M7) (2). 
We We 

. >. 
72) (ae 
rocvwe | 


37. Let J, (IR) be the vector space of histograms with n ordered bins with real values. Consider the 
“re-binning” transformation T: 72(R) > %(R) defined by T(J) = K, which re-bins data by 
adding the contents of bin pairs. That is, the value in the first bin of K is the sum of the values of 
the first two bins of J, the value of the second bin of K is the sum of the values of the third and 
fourth bins of J, etc. An example is shown in Figure 4.11. Show that T is a linear transformation. 

38. Consider Z512x512(R), the vector space of 512 x 512 grayscale radiographic images, and J256(R), 
the vector space of 256-bin histograms. Suppose T : Z512x512(R) — J256(R) is the transformation 
which creates a histogram of the intensity values in a radiograph. More precisely, suppose h = T (b) 
for some radiograph b and let hx indicate the value for the k“” histogram bin. The action of T is 
defined as follows: h, is the number of pixels in b with value less than one; 256 is the number of 
pixels in b with value greater than or equal to 255; otherwise, hx is the number of pixels in b with 
value greater than or equal to k — 1 and less than k. Determine if T is a linear transformation. 

39. Let U, V, and W be vector spaces and let T; : V ~ W, 7. : V — W, and 73: W > U be linear 
transformations. (See Theorem 4.2.21.) Prove that the following transformations are linear. 


(a) ee NG W, 
(b) 7; —-T,:V > W, 
(c) aT]: 7 — W, for any scalar a, 
(d) 307% :V>U. 


40. Prove that the zero transformation and the identity transformation are both linear. 

41. Using the concepts from this section, prove that the zero transformation and the identity transformation 
are both unique. 

42. Let V and W be vector spaces, S C V, and T: V — W a linear transformation. Define T(S) = 
{T(s) | s € S}. Prove that S is a subspace of V if, and only if, T(S) is a subspace of W. 

43. Let V be an n-dimensional vector space with basis 6. Define S: R” — V to be the inverse 
coordinate transformation: for all wu € R”, S(u) = v where [v]g = u. Prove that S is linear. 

44. Describe a scenario in which the difference transformation (7; — T)(x) = T(x) + (-l)Ih(x) 
could be of interest in a radiography setting. 

45. Suggest another image brightening transformation (see Example 4.2.11) that can improve contrast 
in both dark and light areas of an image. Is your transformation linear? 

46. Find a transformation T: V — W for which T(x + y) 4 T(x) +7 (y) for all x, y € V, but 
T (ax) = aT (x) for all x € V and all scalars a. 

47. Consider the vector space of 7-bar LCD images, D(Z2) and ordered basis 
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Fig.4.11 Example histogram J € .712(IR) and the result of the re-binning transformation K = T(K) € 76(R). 
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ee | st lit 
t tial 
Define T: D(Z2) > D(Z2) by T (by) = dy, where by is the k'” element of B and d; is the k“” 
element of 
D= ) 
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Find T\ “em” /. Is D abasis for D(Z2)? 


4.3 Explorations: Heat Diffusion 


Recall in Chapter 1, we introduced the application of diffusion welding. A manufacturing company 
uses the process of diffusion welding to adjoin several smaller rods into a single longer rod. The 
diffusion welding process leaves the final rod heated to various temperatures along the rod with the 
ends of the rod held at a fixed relatively cool temperature 7p. At regular intervals along the rod, a 
machine records the temperature difference (from 7p) obtaining a set of values which we call a heat 
state. We assume that the rod is thoroughly insulated except at the ends so that the major mechanism 
for heat loss is diffusion through the ends of the rod. 
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We want to explore this application further. Suppose we have a rod of length L with ends at x = a 
and x = b. Let T(x, t) be the temperature of the rod at position x € [a,b] and time ¢. Since the 
ends of the rod are kept at a fixed temperature 7p, we have T (a, t) = T(b, t) = To. Define a function 
f : [a,b] x R — R that measures the difference in temperature from the temperature at the ends of 
the rod at time t. That is, f(x, t) = T(x, t) — T(a, t). Notice that f(a, t) = f(b, t) = 0. Even though 
jf measures a temperature difference, we will often call the quantity f(x, t) the temperature of the rod 
at position x and time f. 

The quantity f(x, t) varies with respect to position x and evolves in time. As time progresses, the 
heat will spread along the rod changing the temperature distribution. We can imagine that after a very 
long time the heat will diffuse along the rod until the temperature is uniform, jim, f(x, t) = 0. We 


can even predict some details on how the heat will diffuse. Consider the illustration in Figure 4.12. 
The green curve shows a possible temperature profile at some time t. The magenta curve shows a 
temperature profile a short time At later. We notice that the diffusion will follow the following trends. 


1. Rod locations where the temperature is higher than the surrounding local area will begin to cool. 
We reason that warm regions would not get warmer unless there is some heat being added to the 
rod at that point. The red (downward) arrow in Figure 4.12 indicates that the warm area begins to 
cool. 

2. Rod locations where the temperature is lower than the surrounding local area will begin to warm. 
We reason that cool regions would not get colder unless there is some heat being removed from 
the rod at that point. The blue (upward) arrow in Figure 4.12 indicates that the cool area begins to 
warm. 

3. Suppose we have two equally warm regions of the rod (e.g., locations x; and x2 in Figure 4.12). 
Location x, has relatively cool areas very nearby, while location x2 does not. The temperature at 
x1 will cool faster than at x2 because heat is more quickly transferred to the nearby cool regions. 
Geometrically, we observe that sharply varying temperature differences disappear more quickly 
than slowly varying temperature differences. 

4. The long-term behavior (in time) is that temperatures smooth out and become the same. In this 
case, temperatures approach the function f(x, t) = 0. 


4.3.1 Heat States as Vectors 


It turns out that we can use linear algebra to describe this long-term behavior! In Section 2.4, we 
defined the finite-dimensional vector space of heat states. We will use our knowledge of linear algebra 
to compute the heat state, at any later time, as the heat is redistributed along the rod. In physics, the 
process is known as heat diffusion. The linear algebra formulation is known as heat state evolution. 
We will remind the reader about our initial discussion on heat states from Section 2.4.1 by repeating 
some of the same discussion points here. 

In the introduction to this section, we modeled the temperature profile along a bar by a continuous 
function f (x,t), which we call the heat signature of the bar. We discretize such a heat signature 
J (x, t) Gn position) by sampling the temperature at m locations along the bar. For any fixed time, these 
discretized heat signatures are called heat states. If we space the m sampling locations equally, then 
for Ax = eal — a4 , we can choose the sampling locations to bea + Ax,a+2Ax,...,a+mAx. 
(We do not need to sample the temperature at the end points because the temperature at the ends is 
fixed at 7y.) Then, the heat state has the coordinate vector (according to a standard basis) given by the 
following vector u in Rt. 


u = [0, W1, U2, ...,Um, 0] =[f(a), f(at+ Ax), f(a +2Ax),..., fla+mAx), f(b)], 
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Fig. 4.12 Example 1D temperature profile with higher temperatures on the left end of the rod and lower temperatures 
on the right. Arrows show the temperature trend predictions at local extrema. The arrows point to a curve showing a 
temperature profile a short time later. 


where we have temporarily suppressed the time dependence for notational clarity. Notice that f(a) = 
f(b) = 0, where b= a+ (m+ 1)Ax. These are the familiar heat states that we have seen before. 
Also, if uj = f(x) for some x ¢€ [a, b] then uj41 = f(x + Ax) and uj_; = f(x — Ax). The figure 
below shows, at some fixed time, a continuous heat signature on the left and the same heat signature 
with sampling points, to create the heat state, marked on the right. 


VV 


Recall the operations of addition and scalar multiplication in this vector space. 
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1. We defined scalar multiplication in the usual component-wise fashion. Scalar multiplication results 
in a change in amplitude only. In the illustration below, the blue heat state is 2 times the red heat 
state. Heat states appear below as continuous curves, but are actually made up of finitely many 


(thousands of) points. 


2. We defined vector addition in the usual component-wise fashion. Addition can result in changes 
in both amplitude and shape. In the illustration below, the red heat state is the sum of the blue and 


green heat states. 
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4.3.2 Heat Evolution Equation 


The discrete heat evolution equation and time evolution transformation given in this section are derived 
in Appendix B. 
Given a heat state at time r, h(t) € 7, (R), the time evolution transformation 


U :Hm(R) > Hm (R) 


is defined by 
U(A(t)) = h(t + At) 


for discrete-time step Ar. That is, U transforms a heat state at time ¢ to a heat state at time ¢ + Af. 
We can work in the coordinate space relative to the standard heat state basis 6, where u = [h]z 
and g : R” — R” is the transformation that transforms coordinate vectors to coordinate vectors by 
g([h]g) = E[h]g; in other words, g(u) = Eu. E is the m x m matrix given by 


1-25 64 0 bas 0 
6 1-26 6 
0 5 
E = ; (4.3) 
5 0 
6 1-25 64 
0 es 0 6 1-26 


where 6 = So . Notice that E has nonzero entries on the main diagonal and on both adjacent diagonals. 
All other entries in E are zero. In this coordinate space, 


u(t + At) = Eu(t). 


Here, we note that we need 0 < 6 < ; for computational stability. Since Ax is fixed, we need to take 
small enough time steps At to satisfy this inequality. Hence, as we let Ax — 0, we are also implicitly 
forcing At > 0. 

It is useful to consider the meaning of the values in the rows and columns of EF. For example, we 
might wonder how to interpret the values in a particular column of E. The j’” column shows how the 
heat at time ¢ distributes to heat at time ¢ + Af at location j in the heat state. Fraction (1 — 26) of the 
heat at location j remains at location j and fraction 5 of the heat moves to each of the two nearest 
neighboring locations j + 1 and j — 1. So, away from the end points: 


uj(t) = duj_1(t + At) + C1 — 28)uj(t + At) + duj4i(t + Ad). 


We can similarly interpret the values in a particular row of E. The j“" row shows from where the heat 
at time ¢ + At came. We have 


uj(t + At) = duj-1(t) + (1 — 28)uj(t) + du ji). 
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In particular, fraction | — 26 of the heat at location j was already at location j and fraction 6 came 
from each of the two nearest neighbors for this location. 

We also notice that all but the first and last columns (and rows) of E sum to |. How can we interpret 
this observation? Based on the above discussion, we see that this guarantees that no heat is lost from 
the system except at the end points. Heat is only redistributed (diffused) at points that are away from 
the end points and is diffused with some loss at the end points. At each iteration, fraction 6 of the heat 
at u,(t) and at u,,(t) is lost out the ends of the rod. 


4.3.3 Exercises 


For Exercises | through 8, consider the heat state diffusion transformation F given in Equation 4.3. 
Suppose we know the heat state u(0) at time t = 0. If we want to find the heat state k time steps in the 
future, u(k At), we compute u(kAt) = Eu(0). 


1. What is the equation for the heat state 2 steps in the future? 1000 time steps in the future? 

2. Find an explicit expression for E”. 

3. Does it look like computing E* is an easy way to find the heat state at some time far in the future 
(for example, k = 1000 time steps away)? 

4. Pick your favorite nontrivial vector u(0) € R* and compute u(1) = Eu(0), u(2) = Eu(1) and 
u(3) = Eu(2). 

5. Does it look like computing u(k) (see Exercise 4) is an easy way to find the heat state at some 
time, k > 1, far in the future? 

6. Suppose the heat diffiusion matrix had the form 

a, bj cy dy 

0 bo cz do 

0 0 63 a3 

0 0 0d 


G= 


G is an upper triangular. How would the computations for the iterative process u(k) = G*u(0) 
compare to the heat diffusion process governed by E? Would they be simpler or not? 

7. Clearly, the computations would be much easier if E was the identity matrix or if E was a matrix 
of all zeros. Why would we not care to discuss an iterative process defined with these matrices? 

8. If you had to perform repeated matrix multiplication operations to compute u(k Ar) for large k, 
what matrix characteristics would you prefer a diffusion operator to have? 

9. Verify, for m = 6, that multiplication by D2 results in the same heat state as using the formula 
given in Equation B.1 from Appendix B. 

10. Following similar reasoning as in Equation (B.1) shows that the discretization of the time derivative 

can be approximated at the jth sampling point on the rod by 


uj(t + At) — uj) 


a 
—uj(t) © he 


ot 
4.3.4 Extending the Exploration: Application to Image Warping 
Recall, in Chapter 1, we briefly discussed the application of Image Warping. The idea is that we want 


to create a video that transitions smoothly from one image to the next. We see such a transition in 
Figure 4.13. 
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Fi 3 Ten images in the warping sequence that begins with the a young boy on a tractor and ends with the same boy 
a little older. 


-2 -1 0 1 2 


x 


4.14 Two heat signatures v (smooth curve) and w (jagged curve). We will evolve v into w. 


In this part of the heat diffusion exploration, we will consider the process of image warping by first 
looking at the process on 1D heat signatures and then discussing how this extends to 2D images. 


Morphing 1D heat signatures 
Suppose that we start with two heat signatures, v and w, and we want to heat evolve one into the other. 
One way to do this is to heat evolve the difference between the two heat signatures v — w. 

In the following, we will explore this process with the heat signatures v and w in Figure 


1. On anew set of axes, sketch the difference v — w. 

2. Use your intuition about heat diffusion to sketch some (at least 3) future heat signatures that might 
arise from evolving v — w for different lengths of time. (Add these to your sketch from the previous 
part). Define a family of heat signatures x;, where x9 = v — w is the initial heat signature, and 
x; is the heat signature after time t. The heat signatures in your drawings that occur earlier in 
the evolution are associated with lower values of ¢ than the ones that occur later. Label your heat 
signatures with this notation, indicating the order based on ¢. 
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3. Add a sketch of the limiting heat signature, x99, that results from evolving v — w. 

4. Now discuss how the sketches you made might help you to morph the function v into the function 
w. (Hint: Consider the related family of heat signatures, where for t > 0, z; := w + x;. On anew 
set of axes, sketch the graphs of z; that correspond to each of the x; you sketched. Also include zg 
and Zo. corresponding to x9 and X59. What is the result of morphing zg to Z99?) 


Observations 

5. The family of functions z,; shows a heat signature morphing into another heat signature. What do 
you notice about this morphing? In particular, what types of features disappear and appear? 

6. Another method for morphing heat signature v into heat signature w is to first diffuse v and then 
follow this with the inverse of the diffusion of w. How is this different from what we obtained 
above by diffusing the signature v — w ? (You may produce sketches of the heat evolutions of v 
and w to support your discussion.) 


Morphing Images 
We now consider the task of morphing a 2D grayscale image into another image! . 

Such an image can be thought of as consisting of pixels in a rectangular array with intensity values 
assigned to each pixel: white pixels have high intensity (heat) and black pixels have low intensity 
(heat). As in the 1D case, we will assume the boundary intensities are zero. Boundary pixels are those 
on the border of the rectangle. 

The principles behind heat diffusion in two dimensions are similar to those we have been discussing 
in one dimension. In particular, 


e Places where the temperature is higher than the surrounding area will cool. 

e Places where the temperature is cooler than the surrounding area will warm. 

e As the heat diffuses, sharpest temperature differences disappear first. 

e The long-term behavior is that temperatures smooth out approaching 0 heat everywhere. 


Hence, if we take an image and heat diffuse it, the final heat state of the image will be the black 
(zero) image. We define a morphing between image V and image W as in the 1D case above: by 
diffusing the difference V — W and considering the associated family X; as before. 


7. In Exercise 17 of Section 2.3, we determined that as long as we have images of the same size, we 
can perform pixelwise addition. The process of heat diffusion can be modeled for the difference 
of images, but in two dimensions. 

8. Given an m x n image (at total of mn pixels), what additional properties should we require in 
order to consider the image as a heat state? In particular, what should be true at the boundary of 
heat state images? 

9. Itturns out that one can construct a matrix E so that the heat evolution operator T : Zinxn > Imxn 
on images can written as T(V) = EV. 

10. Above, we observed that the diffusion matrix for 1D heat states sends heat from warmer regions 
to cooler regions, raising the temperature there. How can we think of the pixel intensities for 
a difference image as heat values? With this thought, what analogous ideas can we apply to a 
difference image so that warmer regions lose heat to cooler regions? 


‘Recall that most images we encounter are color images, but up to this point of the text, we have considered only 
grayscale images. Morphing a color image is done by considering the red, green, and blue components of the image 
separately and then combining them. 
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11. Consider the structure of the 1D heat diffusion matrix E. What similarities would E have with E? 
Describe the features of such a matrix E. (It may help to consider the interpretation of the rows 
and columns of F found on Page 208.) 

12. (Challenge Question) Fix a basis (say the standard basis) for the space of 4 x 4 heat state images 
and determine the matrix E. 


4.4 Matrix Representations of Linear Transformations 


In our studies of the radiographic transformation and of the heat state evolution transformation, we 
have modeled the transformations with the matrix multiplication operation. In reality, brain images, 
heat states, and radiographs are somewhat abstract objects which do not lend themselves to matrix 
operations. However, we now have tools for numerical operations based on coordinate vectors in 
familiar vector spaces (IR”). Between these coordinate spaces, matrix operators serve as proxies for the 
transformations of interest. In this section, we will solidify these general ideas. We need to know which 
types of transformations can be written in terms of a matrix multiplication proxy. We also want to keep 
sight of our radiography goal: does an inverse transformation exist, and if so, what is its relationship 
to the proxy matrix? 

The main result of this section is the fact that every linear transformation between finite-dimensional 
vector spaces can be uniquely represented as a left matrix multiplication operation between coordinate 
spaces. Consider the illustrative graphic in Figure 4.15. We can think of a linear transformation T: V > 
W as a composition of three transformations: T = T3 0 T2 o T;. Consider two vectors spaces V and 
W with dim(V) =n and dim(W) = m. The first transformation, 7;, maps v € V to its coordinate 
vector [v]z, € R”. The next transformation, 7, maps the coordinate vector [v],,, to the coordinate 
vector [w]g,,. And finally, 7; maps the coordinate vector [w]g,, to the vector w € W. Instead of 
directly computing w = T(v), our explorations with radiography and heat state evolution have been 
with the coordinate space transformation [w] gy = 72([vlay) = M[v]a,, where M € Myx, (R). The 
transformation 7T> is shown in red to indicate that we have yet to answer questions concerning its 
existence. The coordinate transformation 7] exists and is linear. The inverse coordinate transformation 
T3 exists and is linear (see Exercise 43 of Section 4.2). Table 4.1 shows some of the correspondences that 
we have already used. So far, we have been able to find a matrix M so that T(v) = Mx forx = [v],, in 
the coordinate space of V relative to some basis By. We found that Mx is the coordinate vector of T (v) 
relative to a basis of W, say By. But is it always possible to find such a matrix M? Figure 4.16 shows 
in red a summary of the transformations for which we do not yet know their existence. In Section 4.2, 
we showed that the coordinate transformations, S; and $3 exist and are linear. We do not yet know 
under what conditions, if any, ensure the existence of the transformations S or S2. The transformation 
S' (if it exists) is an inverse transformation in that S o T is the identity transformation on V and T o S 
is the identity transformation on W. The question of the existence of S' is equivalent to understanding 
the existence of Sy because S = S; 0 Sz o $3. In this section, we will discuss the answer about the 
existence of 7> in Figure 4.15 and build some tools to answer the remaining questions. 


4.4.1 Matrix Transformations between Euclidean Spaces 


In the situation where both the domain and the codomain are euclidean spaces, the situation of 
Figure 4.15 is simpler. We saw, in Section 4.2, that multiplication by any m x n matrix is a linear 
transformation from R” to R”. It turns out that every linear transformation between R” and R” can 
be represented as multiplication by an m x n matrix. 
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Table 4.1 Three examples of vectors, corresponding coordinate vectors, and matrix representations of a transformation. 
The radiography example is that of Section 4.1.4. The heat diffusion example is for m = 4 with a diffusion parameter 


6= i The polynomial differentiation example is for transformation T : P2(IR) > P2(R) defined by T(p(x)) = p'(x). 
In all cases, we use the standard bases By and By. 


Radiography Heat Diffusion Polynomial Differentiation 


v . 4 3 5 4 ante BS ee Se 
1 4 3 
3 3) 
[v] By > 9 —2 
4 Sy : 
Ih'1 0 0 000 
Ih1O'p Vath Nfs 0 200 
1hOl'p 0 Mf lp l/s 010 
0 0 'A'hb 
UI /4 
Mf 19/4 0 
[w]By op Bf 6 
19/4 = 
9 4 
2 
W 11 ‘ 9 1 2 3 i $ 1 6x = 2) 
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Fig. 4.15 Illustration of the equivalence of linear transformation T with the composition of two coordinate 
transformations 7; and 73 and one matrix product T> ([v]z,) = M{v]p, = [vl ay- 


T 


Fig.4.16 Illustration of the equivalence of linear transformation T and a possible inverse transformation S. Compare 
to Figure 4.15. 


Definition 4.4.1 


Given the linear transformation T :R” > R”, we say that M € Min xn(R) is the matrix 
representation for the transformation T if T(x) = Mx for every x € R”. 


Lemma 4.4.2 
Let T : R’ — R” be a linear transformation. Then there exists a unique m x n matrix A such 
that T(x) = Ax for all x € R”. 


Proof. Let T : R” — R” bea linear transformation. By Theorem 4.2.26, we know that T is uniquely 
determined by how T transforms the basis elements. Consider the m x n matrix A 
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A=] T(e)) T(er) --- T(en) 


whose columns are the outputs of the n standard basis elements of IR”, written with respect to the 
standard basis on R”. One can verify that Ae; = T(e;) fori = 1, ..., n. (See Exercise 24.) Hence, by 
Corollary 4.2.27, T(x) = Ax for all x € R”, and so A is a matrix representation of T. 

To see that A is the only matrix that represents T, we recognize that any matrix B that represents 
T must map (via multiplication) the standard basis of IR” to the same vectors that T maps them to. 
Hence, the ith column of B must be T (e;) (written with respect to the standard basis.) 


Example 4.4.3 Consider the linear transformation T : R? > R? given by 


Xx x+y 
T = 
: i) 


The matrix representation of T has columns T (e;) = icy) T (e2) = (i) and T(e3) = C) We 


110\("\_( x+y 
= 42 : ~\az+y—x 


verify with a computation that 


as desired. 


4.4.2. Matrix Transformations 


Suppose we have two vector spaces V and W. Let V be n-dimensional with basis By and W be 
m-dimensional with basis By. Suppose we are given a linear transformation T : V > W. We are 
interested in finding a way to transform vectors from V to W, possibly taking a new path using 
matrix multiplication. Recall that the transformation T; : V > R"” defined by T;(v) = [v]g,, is linear 
(see Theorem 4.2.16). Let 73 be the transformation that takes coordinate vectors in R” back to their 
corresponding vectors in W. We know that 73 is a linear transformation (see Exercise 43 of Section 4.2). 
We also know that we can multiply vectors in R” by m x n matrices to get vectors in R”. We want to 
find M € My»xn(R) so that 


T) : R" > R" by To(x) = Mx for all x € R" 


and so that 
To([v] py) = M[v]g, = [T(v) By for all v € V. 


That is, we want 7) to transform [v],, into [w]g,, in the same way T transforms v into w. (See Figure 
4.15). 

First, we answer our central questions concerning the existence of Sz and of S, as well as their 
relationship to T and the coordinate transformations. We will show that if 7 is linear, then the matrix 
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representation M = rie not only exists but is uniquely determined by 7, By and By. As you read 
through the next two lemmas and their proofs, use Figure 4.16 to keep the transformations organized 
in your mind. 


Lemma 4.4.4 
Given vector spaces V and W and a linear transformation T : V > W. If dim V =n and 
dim W =m, then there exists a linear transformation 7) : R” — R” so that for all v € V 


T(lv]gy) = (TO) ]By- (4.4) 


for any bases By and By of V and W, respectively. 


Proof. Suppose T : V — W linear. Suppose also that By and By are bases for V and W. Let v € V 
and define w = T(v) € W. We will show that there is a linear transformation 77 : R” — R” satisfying 
(4.4) above. We know, by Exercise 43 of Section 4.2 that there exist a coordinate transformation 


S3: W — R” so that $3(w) = [w]g 


W 


and an inverse coordinate transformation 


S,:R" > V sothat S)({v]g,) = v. 


By Theorem 4.2.21 and the linearity of T, we know that $3 o T o S; is also linear. Now, we need only 
show that if we define 7, = $3 0 T o Sj, then 7 satisfies (4.4) above. Using Definition 4.2.25, we have 


To([v]p,) = (S30 T o Si) ([v]p,) 
= $3(T(Si ([v]B,))) 
= $3(T(v)) 
= S$3(w) 
= [w]By- 


Therefore, 7> is a linear transformation that satisfies (4.4). 


We continue establishing the existence of each of the transformations in Figure 4.16. 


Lemma 4.4.5 
Let V and W be vector spaces with dim V =n and dim W = m. Suppose there is a linear 
transformation T : V — W. Then, there exists a linear transformation 

S:W — V with S(T(v)) = v, forallv € V 


if and only if there exists a linear transformation 


So: R” > R® with S)((T(v)]g,,) = [v]g,, forall v € V. 
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Proof. Suppose V and W are vector spaces with dim V =n and dim W = m. Suppose T : V > 
W is linear. Suppose, also, that v € V. Define w = T(v) € W. We’ve shown that there exist linear 
coordinate transformations 7; : V > R” and $3: R” — W. We also know that there exist linear 
inverse coordinate transformations S$; : R” — V and T3 : R” > W. 

(=>) First, suppose S, as described above, exists and define $2 = 7; o So T3. By Theorem 4.2.21, 
we know that 5 is linear. We need only show that S>([w],,) = [v]g,. We know that 


S2([w]By) = (T1 0 So T3)([w] By) 
= T(S(73(lw],,)))- 


Thus, 52 = T; 0 S o T3 is a linear transformation with Sz ([w]g,,) = [v]B,- 
(<=) Now, suppose S2, as described above, exists and define S = Sj 0 Sy o $3. Again, we know that, 
by Theorem 4.2.21, S is linear. Now, we know that 


S(w) = (S1 0 S20 S3)(w) 
= S1(S2(S3(w))) 
= S1(S2(Lw]By)) 
= Si([v]By) 
=v. 


Therefore, S = S; 0 S2 o S3 is a linear transformation so that S(w) = v. 


Theorem 4.4.6 
Let V and W be finite-dimensional vector spaces with bases By and By, respectively. Let 


T: V > W. If T is linear then there exists unique matrix M = Fale so that M[v]p, = 
[T(v) ]py forallv € V. 


Proof. Suppose V and W are finite-dimensional vector spaces with bases By = {y1, y2,--- , yn} and 
Bw = {Z1, 22,°°: , Zm}, respectively. Let T : V > W belinear. Let v € V anddefine w = T(v) € W. 
We will show that there exists a unique matrix M, such that M[v]g, = [w]sy. Suppose v € V and 
w € W. Then, we can express v and w, uniquely, as linear combinations of basis vectors, that is, there 


are scalars ax and cj; so that 
n m 


v= S- ak ye and w= Soe: 


k=1 j=l 


Now, since T is linear, we have 
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m 


> cjzj=w 
j=l 
= T(v) 
=T (> au] 
k=1 
= )\ aT (yx). 
k=1 


Since the codomain of T is W, we can express each vector T(y;) as a linear combination of basis 
vectors in By. That is, there are scalars b jx so that 


m 
T (ye) = > baz: 


j=l 


Therefore, we have 


m m n m n 
oe = So aj (>>| Lj = > (Ya) Zj- 
k=1 


j=l j=l j=l \ke! 
Socj = pa bjxax, forall j = 1, 2,..., m.Wecan then, combine these unique coefficient relationships 
as the matrix equation 
Cl by by +++ Din ay 
C2 bz b22 +++ ban az 
Cm bmi bm2 +++ bmn an 
Thus, we have 
a ay 
C2 a2 
lwlby = =M = M[v]p,. 
Cm an 
where 
by biz ++ bin 
bz br +++ bon 


bmi bm2 ee Dinn 


Notice that the entries in M are determined by the basis vectors By and By, not the specific vectors 
v or w. Thus, M is unique for the given bases. 
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The unique matrix given in Theorem 4.4.6 is the matrix representation of T. 


Definition 4.4.7 


Let V and W be vector spaces with dim V = n and dim W = m. Given the linear transformation 
T : V — W, we say that the m x n matrix M is the matrix representation of T with respect to 
bases By and By if [T(x)]by = M[x]p, for every x € R”. 


It is common to indicate the matrix representation M of a linear transformation T : V > W by 
M= kas where By and By are bases for V and W, respectively. If V and W are the same 
vector spaces, and we choose to use the same basis 4, then we typically write M = [T], to indicate 
M=([TIB 

B: 

The proof of Theorem 4.4.6 suggests a method for constructing a matrix representation for a linear 
transformation between finite-dimensional vector spaces, given a basis for each. The k!” column 
of M is the coordinate vector, relative to By, of the transformed k‘” basis vector of By. That is, 
Mk = (TO) ay 


Corollary 4.4.8 
Let V and W be vector spaces with ordered bases By = {y, y2,..-, yn} and By, respectively. 
Also, let T : V — W be linear. Then the matrix representation M = ize is given by 


| | | 
M= | [TOD] sy (702 ]By --- On IBy | > (4.5) 
| | | 


where [T (yx) ]By is the k* column of M. 


Proof. The proof follows directly from the proof of Theorem 4.4.6. 


Example 4.4.9 Let V = {ax? + bx +(a+b) | a,b € R} and let W = M2,2(R). Consider the 
transformation T : V — W defined by 


2 a b-a 
T(ax* + bx+(a+b)) = Ces. 
The reader can verify that T is linear. We seek a matrix representation, M, of T. Recall, that the 
columns of the matrix representation M of T are coordinate vectors in W according to some basis By. 
Thus, we must choose a basis for W = M>,2(R). We must also apply the transformation T to basis 
elements for a basis of V. Therefore, we must also choose a basis for V. Now, T : V > Mo x2(R), 
so we will choose Gy to be the standard basis for M2,.2(IR). To find the basis by, we write V as the 
span of linearly independent vectors. In fact, we can see that 


V = (ax? + bx + (a +b)| a,b € R} = Span {x? + 1x41}. 
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So we define the basis By = {x? + 1, x + 1}. Since V isa2D space, the corresponding coordinate space 
is R*. The coordinate space for the 4—dimensional space, W is R*. Thus, the matrix representation M 
is a4 x 2 matrix. We also want M to act like T. That is, we want [T(v)]sy = M[v]p,. We need to 
determine to where the basis elements of V get mapped. By the definition of T, we have 


T(x? +1) = (; 1) 


and 


rat y=(7)). 


And, the corresponding coordinate vectors in R* are 


1 
2 _|fi-l fl 
Ww 
1 
and 

0 

01 1 

reso =[()],+[ 

W 
2 
According to Corollary 4.4.8, 
M —, 


We can (and should) check that the transformation T : R? > R* defined by T(x) = Mx transforms 
the coordinate vectors in the same way T transforms vectors. Let us test our matrix representation 
on v = 2x? + 4x + 6. We know that v € V because it corresponds to the choices a = 2 and b = 4. 
According to the definition for T, we get 


T(v) = TQx? + 4x 4+ (2+4)) = er a) = € a 


Notice that v = 2(x? + 1) +4(x + 1). So 


lv]py = (;) . 


Finally, we check that multiplying [v]p,, by M gives [T(v)]p,,. Indeed 
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10 
mow =(111]() 
12 


7 (6 ro), 


We can check this for a general case by using an arbitrary vector in v € V. Let, v = ax? + bx + 
a+b for some a, b € R. Then 
a b-a 
a con) 


We see that 


1 

1 1 
11 
1 2 


[3 )0 
(: 


ee 


a+ba+ aa 
= [T(v)]gB 


Thus, as expected [T(v)]B, = M[v]b, 


Next, we apply this procedure to a small radiographic transformation. In Section 4.1, we constructed 
some matrix representations for radiographic transformations with a few object voxels and radiograph 
pixels. In the next example, we consider a similar scenario, constructing the transformation 7 using 
Corollary 4.4.8. 


Example 4.4.10 Consider the following radiographic scenario. 


e Height and width of the image in voxels: n = 2 (Total voxels N = 4) 
e Pixels per view in the radiograph: m = 2 

e ScaleFac = V2 

e Number of views: a = 2 

e Angle of the views: 0; = 0°, 62 = 135° 


There are two radiographic views of two pixels each and the object space has four voxels as shown 
here: 
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We will use the standard bases for both the object and radiograph spaces. That is, 


1/0 0 | 0 0] 1 0 
Bo= ; ’ ’ ’ 
0 | 0 1/0 0] 0 0} 1 


“% ™ SN NI 


1 0O 0 1 0 0 0 0 


oO 


The columns of the matrix representation of the transformation are found using Corollary 4.4.8 as 


follows. 
1 
che [ ~ 1 
0 
Br 1 
1 0 e 
Ua 
i; 1 
r 0/0 p 0) 
1| 0 7 Ya}? 
Br Vp 
1 0O 


Br 
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| ee Ag 
(iyL-| |G 


R If 

0 1 i 

Na] 
Ni ¢) 
0 | 0 1 

T = 

Oj} 1 1 
Br 0 

01 ee 


IT Ip5 = 


xx Watch Your Language! The matrix representation is not unique, but rather depends upon chosen bases. 
Let V and W be vector spaces with bases By and By, respectively. Suppose T: V > W is a linear 
transformation with matrix representation M. We list key relationships and the corresponding linear 


algebra language used to discuss such relationships. (Pay attention to which statements are correct to say 
and which are incorrect.) 


x*\NN 


**x*N NON 


iN 


T(v) = w. 


T transforms v to w. 

w is the transformation of v under T. 
T of vis w. 

T times v is w. 


MIv]py = [Wlpy- 


The coordinate vector of v is transformed to the coordinate vector of w through multiplication by the 
matrix M. 


The coordinate vector of w is the transformation of the coordinate vector of v using multiplication by 
the matrix M. 

M times the coordinate vector of v is the coordinate vector of w. 

M transforms the coordinate vector of v to the coordinate vector of w. 

M transforms v to w. 


M is a matrix representation of the linear transformation T. 
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Y Misthe matrix representation of the linear transformation T corresponding to the bases By and By. 
(In appropriate context, we often say, “M is the matrix representation of T.”’) 

X MisequaltoT. 

X Mis the linear transformation that transforms vectors in V to vectors in W in the same way T does. 


4.4.3. Change of Basis Matrix 


Consider brain images represented in coordinate space R relative to a basis Bo. Perhaps this basis 
is the standard basis for brain images. Now suppose that we have another basis 6, for the space of 
brain images for which v43, is a brain image strongly correlated with disease X. If a brain image x is 
represented as a coordinate vector [x ],, it may be relatively simple to perform necessary calculations, 
but it may be more involved to diagnose if disease X is present. However, the 431%’ coordinate of 
[x], tells us directly the relative contribution of v43] to the brain image. Ideas such as this inspire the 
benefits of being able to quickly change our coordinate system. 

Let T : R” — R" be the change of coordinates transformation from ordered basis 6 = {by, bo, ..., 
b,} to ordered basis B= {b ls bo, spits by }. We represent the transformation as a matrix M = iT ie. The 
key idea is that a change of coordinates does not change the vectors themselves, only their coordinate 
representation. Thus, T must be the identity transformation. Using Corollary 4.4.8, we have 


a 
—~ 
= 
= 
= 
Ww 
fies 
BR 
a 
—~ 
=a 
> 
iS) 
wm 
= 
BR 
a 
—~ 
= 
> 
= 
— 
= 
BR 


Definition 4.4.11 


Let B and B be two ordered bases for a vector space V. The matrix representation [[ if for the 
transformation changing coordinate spaces is called a change of basis matrix. 


The k‘” column of the change of basis matrix is the coordinate representation of the k’” basis vector 
of 6 relative to the basis B. 
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Example 4.4.12 Consider an ordered basis B for R? given by 
1 
B — U= 1 >= 0 » U3 = 
1 


Find the change of basis matrix M from the standard basis Bo for R? to B. We have 


We can find [e;] by finding scalars a, b, c so that 


ey = avy + bv2 +03. 


Solving the corresponding system of equations, we geta = 1,b = 1,c = —1. So, 
1 
[eile= | 1 
—1 


Similarly, we find that 


0 -1 
[ez]gp = | —1] and[e3]g =] 0 
1 1 
Thus, 
1 0 -l 
M=j]1 -1 0 
-1 11 
x 
Now, we can write any given coordinate vector (with respect to the standard basis) v = | y |, asa 

z 


coordinate vector in terms of B as 
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X—Z 
— x—y 
“XT yrZ 


Often, when we are working with several transformations and several bases, it is helpful to be 
able to combine the matrix representations of the transformations with respect to the bases. Consider 
the following theorems which combine matrix representations of multiple linear transformations. The 
proof of each follow from the properties of matrix multiplication and the definition of the matrix 
representation. The first theorem shows that matrix representations of linear transformations satisfy 
linearity properties themselves. 


Theorem 4.4.13 


LetT,U:V—>W be linear, w a scalar, and V and W be finite-dimensional vector spaces with 
ordered bases 6 and B, respectively. Then 


(a) (7 +UR =I7IB +R 
() (eT =al(riB. 


Proof. Suppose V and W are finite-dimensional vector spaces with bases 6 and B, respectively. 
Suppose 7, U : V > W are linear. 


(a) Weshow that (T + UJ8 [xlg = (rib nt (vig) [x], for arbitrary x € V sothat by Definition 4.2.25, 


[T + JB = (718 + (UB. Let y = (T+ U)(x), yr = T(x) and yy = U(x). We have y = 
(T + U)(x) = T(x) + U(x) = yr + yu. Because coordinate transformations are linear, [y] 4 = 
[yr + yulg = [yrlg + Lyu]g- And finally, 


[T + UB ble = bg = brig t bulg = (TB bls + (UB bel. 
(b) Similar to part (a), let a be any scalar and y = (@7)(x) and yr = T(x). We have 


[oT 18 els = bg =ebrlg =o (TB le. 


so that [oT }§ = a [T18. 


Theorem 4.4.13 suggests that, given finite-dimensional vector spaces V and W, the set of all 
transformations with domain V and codomain W might also be a vector space over IR. We leave 
the exploration of this suggestion to Exercise 23. 

The next theorem shows that matrix representations of compositions of linear transformations 
behave as matrix multiplication operations (in appropriate bases representations). 
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Theorem 4.4.14 
LetT : V ~ WandU : W —> X belinear, u € V. Let V, W, and X be finite-dimensional vector 
spaces with ordered bases B, B’, and B, respectively. Then 


[Uo TB = (ui, (rie . 


The proof of this theorem is similar to the proof of Theorem 4.4.13 and is the subject of Exercise 19. 


4.4.4 Exercises 


For Exercises | through 12, find the matrix representation, M = Bae of given linear transformation 
T: V —> W, using the given bases By and By. Be sure you can verify the given bases are indeed 
bases. 


1. Let V be the space of objects with 4 voxels and W the space of radiographs with 4 pixels. Define 
T:V— Was 


t+ 2X2 
71 | t+ 2X4 
T aaa tt) 2 : 
Lo | £4 301 + 24 3X4 
iD Bi 
3%1 + 23+ gt 


Let By and By be the standard bases for the object and radiograph spaces, respectively. 
2. Consider the same transformation as in Exercise 1. Use the standard basis for the object space. 
Use the following basis for the radiograph space. 


1 1 1 1 

0 1 1 1 
By = ; ; , 

0 0 1 1 

0 0 0 1 


3. Consider the same transformation as in Exercise |. Use the standard basis for the radiograph space. 
Use the following basis for the object space. 


1/1 1| 0 1} 1 0/1 
By = ’ ’ ’ 
1| 0 1| 1 0} 1 1} 1 


4. Let V = Mox2(R) and W = R*. Define T as 


228 


10. 


11. 


12. 
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- 11 11 11 10 . 4 
Let By = {(; i} : (; 5) ; ( 0) , ic a) and let By be the standard basis for R”. 


. Let V = W = P2(R) and T defined by T (ax? + bx +c) = cx? + ax +b, and let By = Bw = 


ie? 1 xe = 1,1), 


. Let V = W = P2(R) and T defined byT (ax? + bx +c) = (a+ b)x* —b +c and let By and 


By be the standard basis for P2(R). 


. Let V = A4(R) and W = R* with T defined by T(v) = [v]y, where Y is the basis given in 


Example 3.4.15 and let By be the standard basis for R*. 


. Let V = W = D(Z2) and T defined by T(x) = x +x, where By = By is the basis given in 


Example 3.4.18. 


. Consider the transformation of Exercise 12 in Section 4.2. Let By = By be the basis given in 


Example 3.4.15. 

Let V = P3(R) with the basis B = {x3, x? + 1,x +1, 1} and W = P> with the standard basis. 
Define T as T (ax? + bx? + cx +d) = 3ax? +2bx +c. 

Let V = R? with the standard basis and W = R? with basis 


1 1 0 
B= 0], : 
0 1 
and define T as 
x x+y 
TiyJ=ly-z 
Zz 0 


Let V = M3,.2(R) with the standard basis and W = M 2,.2(R) with the basis 
B= 11 00 10 01 
~1lLoo/’\1il1/’?\01/’?\00 


a\\ a\2 
ay a\2 + a22 
T | a2) a2 | = 


a2 + 431 a32 
a3) 432 


and define T as 


For Exercises 13-17, choose common vector spaces V and W and a linear transformation T : V > W 
for which M is the matrix representation of T when using the standard bases for V and W. Check your 
answers with at least two examples. 


13. 


14. 


32 
#=(31) 
ae a 
) 
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-10 
15. M=]| 2 -1 
a) 
1001 
6. M=(59 94) 
i= tt 
itv e123 
2200 


Additional Exercises. 


18. Consider the space of 7-bar LCD images. Find the change of basis matrix [J iz € M7x7(Z2) 


where 
= => (<=) aE . == == 
, TAA 
‘LOU 

and 


By = “ ye Sy oe, | Se, 


{ , 
Ie "I = 7 
a» => Bo 
19. Prove Theorem 4.4.14. 
210 
20. Can the matrix M = | 010] be the matrix representation of the identity transformation on a 
101 


vector space of your choosing? Can M be the matrix representation of the zero transformation on 
a vector space of your choosing? Explain. 

21. A regional economic policy group is trying to understand relationships between party affiliation 
of elected officials and economic health. The table shows a summary of recent election data and 
economic health indicators. Let vz € R* represents the percentage of voters aligning with three 
political parties in election year k. Let wz € IR? be the corresponding economic indicator values 
two years after the election in year k. Can economic health prediction be modeled as a linear 
transformation T : R* + R*? 


year Uk Wk 
k=5 (40, 45, 10) (30, 25) 
k = 10 (45, 35, 25) (40, 35) 
k = 15 (35, 40, 20) (35, 30) 


230 4 Linear Transformations 


22. Find a basis for H2(R) for which the matrix representation of the heat diffusion transformation 
is a diagonal matrix. Repeat for H3(IR). Would these bases be useful when computing the time 
evolution of a given heat state? 

23. Given V and W with dim V = n and dim W = m. Show that 


T={T:V—>W} 


is a vector space. What is the dimension of T? 
24. Show that if 7 : R” — R’” is a linear transformation with T(x) = Ax for some m x n matrix. 
Then the ith column of A is Ae;. 


4.5 The Determinants of a Matrix 


In this section, we investigate the determinant, a number that describes the geometry of a linear 
transformation from R” > R”. 

We begin by considering transformations from R* — R72, since these are the most simple to 
visualize. Recall, in Section 4.2 Exercises 14-27, that we showed that any linear transformation T from 
IR? to R? maps the unit square onto a (possibly degenerate) parallelogram that is determined by the two 
vectors T(e,) and T(e2). The vertices of the parallelogram are 0, T(e1), T(e2), and T(e1) + T(e2). 
We now develop this idea further. 


Example 4.5.1 Consider the linear transformation T : R* —> R? be given by T(x) = Ax, where A = 
G Notice, in Figure 4.17, that T(e,) = e; and T(e2) = 2e2, so the unit square is mapped to a 
rectangle that is one unit wide and two units tall, twice the area of the original square. 

Next, what happens if we choose some small number 6 and consider the smaller square determined 
by the vectors Se; and 5e2? This smaller square has area 5? and its image is the rectangle determined by 
the vectors T (6e,) = 6T (e;) = de; and T (6e2) = dT (e2) = 26e2. So the image of the smaller square 
has area 252; the transformation has again doubled the area. 

Moreover, even if we translate the square, the area of the image doubles: consider the smaller square 


in the previous paragraph translated by (5) (so the vertices are (a, b), (a+ 6, b), (a +6,b+5), and 


(a,b + 6)). The image of the square under the map T has vertices 


Ty 


Fig.4.17 The transformation T stretches the unit square vertically by a factor of 2. 
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Fig.4.19 The transformation T stretches the “blob” vertically by a factor of 2. 


a\ (a at+6\_ fat+é6 
r()=(): 705") = Ca) 
at+6\ (até a _ a 
a) Ey oy) op oh) 


These are the vertices of a rectangle of width 6 and height 26 and lower left corner at the point (a, b). 
In this case as well, the square of area 57 has been transformed into a rectangle of area 25° (as can be 
seen in Figure 4.18). 

So apparently, T doubles the area of squares (whose sides are parallel to the coordinate axes). What 
about other shapes? Suppose we start with the irregular set (a “blob”) in R* and we want to calculate 
the image of the blob under 7. We can imagine overlaying a fine square grid over the blob; the image 
of any small grid square doubles under the transformation T, so the area of the blob must also double; 
see Figure 4.19. 

This is consistent with our understanding that the linear transformation T preserves distances in the 
e,-direction but stretches distances in the e2-direction by 2. 


Example 4.5.2 Consider the linear transformation T : R* — R? be given by T(x) = Ax, where A = 


10 : 1 
(; Notice that T(e,) = ic 


shown in Figure 4.20. This parallelogram also has unit area. 


and T (e2) = e2, so the unit square is mapped to the parallelogram 
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Fig.4.20 The transformation T transforms the unit square to a parallelogram of the same area. 


As before, we consider the image of the smaller square determined by the vectors de; and de2. 
This smaller square has area 6” and its image is the parallelogram determined by the vectors T (6e,) = 
6T(e;) =6 (i) and T (6e2) = 6T (e2) = de2. This parallelogram has area 8’, equal to the area of the 
square with side lengths 6. 

Also, following a parallel argument as in the previous example, any square of side length 6 is 
mapped to a parallelogram of the same area. In addition, by the same grid argument as before, this 
transformation will not change the area of a blob, though it does “shear” the shape. 


In the previous two examples, we considered the area of the image of the unit square T(S), and this 
number, in each case, was the amount that each transformation increased area. It turns out that any 
linear transformation from R? to R? will change areas of sets in R* by a fixed multiplicative amount, 
independent of the set! And this factor can be determined just by looking at where the transformation 
sends the unit square! 


Example 4.5.3 For each of the following matrices A, consider the linear transformation T : R* > R? 
defined by T(x) = Ax. Find the factor by which each transformation increases area. That is, find the 
area of the image of the unit square under each of these transformations. (Hint: sketch the image of 
the unit square in each case.) 


L172 
© (0 4f) 


0) 
1/2 1/2 
(c) (_ 1/2 :) 
1/21 
© tae . 
o() 


It turns out that there is a formula that will compute the area distortion factor of a linear transformation 


from R? to R?, given the matrix representation A = (: ) of the transformation. Specifically, the 
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Tp 


Tp(e2) 


Fig.4.21 A change in sign in the determinant shows as a “flip” of the image. 


quantity a -d — b-c is called the determinant of A, and its absolute value |a - d — b - c| is the factor 
by which area is distorted. 
Let’s check this for the linear transformation in Example 4.5.1. We find that the determinant of A = 


i ;) is |-2—0-0= 2. This corresponds to our discovery that this linear transformation doubles 


area. 


Similarly, for the linear transformation in Example 4.5.2, the determinant of A = (; ) isl-1— 


0-1 = 1, which reflects the fact that this transformation did not change area. 
Now, for each matrix (: 4) in Example 4.5.3 above, calculate the quantity a -d — b- c. Youshould 


see that indeed the absolute value of the determinant |a - d — b - c| corresponds to the area distortion 
factor in each case. 
So we have a good geometric interpretation for the absolute value of the determinant of a 2 x 2 


02 20 
(from parts (a) and (d) of Example 4.5.3) distort area by a factor of 2, but their determinants have the 
opposite sign; the determinant of A is 2 while the determinant of B is —2. In the following example 
we explore the geometry behind the difference in sign of the determinant for these two maps. 


matrix, but what about the sign of this quantity? Both of the matrices A = ¢ 5) and B= € 7 


Example 4.5.4 Consider, again, the transformations T,4 and Tg from above. We will apply these 
transformations to an image of the kanji for “moon” as seen in Figure 4.21. The image of the unit 
square is the same parallelogram in both cases but the sides come from different sides of the unit 
square. In particular, notice that in the linear transformation associated with B, the image of the unit 
square (and the kanji) has been “flipped over,” but that in the linear transformation associated with 
A, the image of the unit square (and the kanji) has not. We say that T4 preserves orientation and Tg 
reverses orientation. 
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Fig.4.22 The right-hand rule: Orientation is positive if x3 points in the direction of the thumb and negative if x3 points 
in the opposite direction. 


Fig.4.23 Left: Positive orientation. Right: Negative orientation. 


For a linear transformation from IR” to IR”, we can analogously consider the volume expansion 
factor and whether the transformation is orientation preserving or reversing. We now describe the 
generalizations of these two geometric ideas for n = 3.” 

A linear transformation T : R* — R? maps a 3D cube to a (possibly degenerate) 3D parallelepiped 
whose corners are the images of the original cube’s corners. We easily see that the factor by which T 
changes volume is just the volume of the image parallelepiped (because the volume of the unit cube 
is 1) and whose sign measures whether the orientation of the parallelepiped is different from that of 
the cube. Here, the idea of orientation is a little more complicated, but can still be visualized in the 
following way. 

We will define whether T preserves or reverses orientation by first defining the orientation of a 
triple, or ordered set of (linearly independent) vectors. If three vectors x), x2, and x3 in R? are linearly 
independent, then x; and x2 span a plane, which separates IR? into two pieces, and x3 must lie on one 
side or the other of this plane. Which side x3 lies on determines whether or not the triple (x1, x2, x3) 
has positive orientation. By convention we use the “right-hand rule:” if you point the index finger of 
your right hand in the direction of x; and the second finger toward x2, then your thumb points to the 
positive orientation side of the plane (see Figure 4.22). If x3 lies on this side of the plane then the triple 
(x1, X2, x3) has positive orientation; if x3 lies on the other side then the set has negative orientation. 
(See Figure 4.23.) 

The 3 edges of the cube emanating from the coordinate origin correspond to the unit vectors e1, 
é2, and e3, and the triple (e1, e2, e3) has positive orientation. Hence, we can check the corresponding 
edges of the image of the cube: T (e;), T (ez), and T (e3). If the triple (T (e;), T (e2), T (e3)) has positive 
orientation then we say that the transformation preserves orientation, and if it has negative orientation 


? Although both volume expansion and orientation can be extended to dimensions higher than 3, we do not include them 
here. 
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(in other words, the orientation of the parallelepiped has “flipped” from the original orientation of the 
cube) then we say that the transformation reverses orientation. 

The determinant of a matrix, which we will define algebraically in the next section, is geometrically 
interpreted as the product of the volume expansion factor and +1 (depending on whether the map 
preserves or reverses orientation). This geometric intuition can yield additional insights, so we 
encourage readers to keep both perspectives in mind. 


4.5.1 Determinant Calculations and Algebraic Properties 


Now that we have a geometric interpretation for the determinant of a square matrix, we will focus on an 
algebraic definition of the determinant. Indeed our geometric intuition suggests some of the algebraic 
properties that the determinant should possess. We give two motivating examples before proceeding 
with the definition. 

As a map from R” to R”, a diagonal matrix maps the unit n-cube to an n-dimensional rectangular 
parallelepiped whose side lengths are equal to the (absolute values of) the diagonal entries of the matrix. 
Hence, we would want our definition to assign the determinant of a diagonal matrix to be the product 
of the diagonal entries. 

Also, suppose that two n x n matrices A and B are identical in every position except along the 
kth row, where the entries of B are a times the entries of A for some a > 0. We can compare the 
parallelepipeds that are the images of the unit n-cube under T, and Tz. The parallelepiped from Tz is 
the same as the parallelepiped from 7, except that under Tz all edges stretched by a times more in the 
direction e,. This results in the parallelepiped from Tg having a times the volume of the one from 7’. 

We now define the determinant as a function that assigns a real number to every matrix and satisfies 
the following properties. As you read the properties, we hope the reader will check that they are 
consistent with our geometric understanding of the determinant. 


Definition 4.5.5 


Let My xn be the set of all x n matrices. We define the determinant as the function det : Myx > 
R with the following properties. For a number a (real or complex) and n x n matrices, A, B € 


Manxn, 


e If A is in echelon form, then det(A) is the product of the diagonal elements. 

e If B is obtained by performing the row operation, Ry = ar; +r, on A (replacing a row by the 
sum of the row and a multiple of another row), then det(B) = det(A). 

e If B is obtained by performing the row operation, Ry = ar; on A replacing a row by a multiple 
of the row), then det(B) = a det(A). 

e If B is obtained by performing the row operation, Ry = rj; and Rj; = rg on A (interchanging 
two rows), then det(B) = —1- det(A). 


Notation. It is common, when context permits, to write |M| and mean det(M). 

There are two important points to consider about this definition. First, you may notice that this 
definition only explicitly states the determinant of a matrix if it is in echelon form. However, we 
can row reduce any matrix to echelon form through elementary row operations, and the remaining 
properties all show how the determinant changes as these row operations are applied. Hence, we can 
use Definition 4.5.5 to find the determinant of a matrix by keeping track of how the determinant changes 
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at each step of a matrix reduction and working backwards from the determinant of the row-reduced 
determinant. 

Second, since there are many ways to row reduce a matrix. This means that we must show that 
any sequence of row operations that reduces the matrix to echelon form will change the determinant 
by the same amount, and that hence there is no ambiguity’ in our definition. We will show this in 
Theorem 4.5.10 below, but before doing this we will give three examples illustrating how to use the 
definition to compute determinants. 


Example 4.5.6 Find the determinant of 


Our goal is to reduce A to echelon form all the while keeping track of how the determinant changes 
according the properties listed above. We can build a table to keep track of our operations, matrices 
and determinants. 


Row Operations Matrix Determinant 
111 
- =i det(A) 
= oe 
oes oe a | 
a es det(A) 
R3=r,4+1r3 02-1 
- 111 
a 0-1 0 det(A) 
02-1 
ae 111 
ee 010 — det(A) 
R3=2r2+1r3 00-1 
ee 111 
=s- 010 det(A) 
001 


Now, using the first property of Definition 4.5.5, we find that 


111 
det} O10] =1. 
001 


3 In mathematical terminology, we say that the determinant is well-defined. 
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Matching the determinant of the reduced matrix with the far right column, we find that det(A) = 
1. Recall that geometrically, this tells us that the linear transformation T,4 preserves volume (and 
orientation). 


Example 4.5.7 Find the determinant of 


i> 2 
A=|[{ 10 1 
82 <4 


Again, we reduce A to echelon form all the while keeping track of how the determinant changes. 


Row Operations Matrix Determinant 
22. 2 
10 1 det (A) 
—22—-4 
mete 11 1 ; 
— 10 1 z det(A) 
—22—-4 
be 1 1 1 
Teo -1 0 
R3=2r1+1r3 0 4-2 


_ 11 1 
= 01 O] | —}det(A) 
R3=4r24+173 


5 det(A) 


010 7 det(A) 


Now, using the first property in Definition 4.5.5, we see that 


111 
det} 010] =1. 
001 


Again, matching the determinant of the reduced matrix with the far right column gives us that 
z det(A) = 1 and so det(A) = 4. 

We conclude that the transformation T, expands volume by a factor of 4, while preserving 
orientation. 
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Example 4.5.8 Find the determinant of 


11-1 
A=|{12 1 
23 0 


Again, we reduce A to echelon form all the while keeping track of how the determinant changes. 


Row Operations} Matrix |Determinant 
111 
12 1 det(A) 
23 0 
a is ee 
— 01 2 det(A) 
23 0 
— 1 it 
cae 01 2 det(A) 
01 2 
ec 111 
a 01 2 det(A) 
00 0 


Now, using the first property in Definition 4.5.5, we see that 


11-1 
det} O01 2] =0. 
00 0 


Finally, matching the determinant of the reduced matrix with the far right column gives us that det(A) = 
0. Geometrically, we conclude that the map T,4 maps the unit cube to a degenerate parallelepiped; one 
that has zero volume. 


In the preceding example, the reduced echelon form of A has a zero along the diagonal and therefore 
the determinant is zero. Here we present this result more generally. 


Theorem 4.5.9 
Let A be ann x n matrix. If A is row equivalent to a matrix with a row of zeros, then det(A) = 0. 
Conversely, if det(A) = 0, then A is row equivalent to a matrix with a row of zeros. 
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The proof is Exercise 32. We will use this fact in several of the proofs that follow. 

We now return to the issue of showing that the determinant of a matrix does not depend on which 
sequence of row operations are used to reduce the matrix. In order to do this, it will be helpful to 
use the language of matrix reduction using elementary matrices from Section 2.2.3. Recall that a 
matrix reduction can be performed through multiplication by elementary matrices. Recall also that the 
elementary matrix for a row operation is obtained by performing that row operation to the identity 
matrix. 

We will consider determinants of the three types of elementary matrices: (1) multiplication of a row 
by a nonzero number, (2) addition of a multiple of a row to another, and (3) interchanging rows. 

By Definition 4.5.5, 


e the determinant of an elementary matrix E that multiplies a row by a nonzero number a is det(E) = 
a, 

e the determinant of an elementary matrix E that adds a multiple of a row to another row is det(E) = 
1; and 

e the determinant of an elementary matrix E that switches two rows is det(E) = —1. 


With this notational tool, we are ready to show that our definition for the determinant of a matrix 
is well defined; that is, the calculation of the determinant of a matrix using Definition 4.5.5 does not 
depend on the sequence of row operations used to reduce the matrix. 


Theorem 4.5.10 
Let A be ann x n matrix. Then det A is uniquely determined by Definition 4.5.5. 


Proof. Let us consider the matrix A whose reduced echelon form is A’. We have two cases to consider: 
(1) A’ # Tand(2) A = J. Inthe first case, we know that A’ has at least one row of all zeros and therefore, 
by Definition 4.5.5, det(A) = 0. Since the reduced echelon form of a matrix is unique, the determinant 
for case (1) is always 0. 

Next, let us consider the case where A reduces to 7. This means that we have k elementary matrices 
E,, Eo,... Ex so that J = E,...E,E,A. Suppose also that the row operation corresponding to 
elementary matrix E applies a multiple a, to the determinant calculation so that 1 = a,a@2...a, det(A). 
Now, suppose we find another sequence of row operations to reduce A to J corresponding to m 
elementary matrices E l> Eo, or ES so that J = Em nee EoE 1A and with E p applying a multiple 6, 
to the determinant calculation so that so that 1 = 6182... Bm det(A). Notice that, all row operations 
can be undone: 


e Multiplying row i by a nonzero constant, a is undone by multiplying row i by i, 
e Adding a@ times row i to row j is undone by adding —a times row i to row j. 
e Changing the order of two rows is undone by changing them back. 


Therefore, there are elementary matrices E}, E4,..., Ej, that undo the row operations corresponding 
to elementary matrices E), E2,..., Ex. In this case, Ey applies a multiple of an to the determinant 
calculation so that 
det(A) = gua 1. 
A{A2...AK 


Now, we see that 
E\ E,...E,1 =A, 
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Therefore, we can perform row operations on A to get back to J as follows. 
(Em... E2E))(E\ E}... E,D) = (Em... E.E\)A = 1. 


Therefore, by Definition 4.5.5, 


det() = re 


2 sc hE 


But, since det(/) = 1, we have that 6; 62... Bm = a1a2...a,% and det(A) is uniquely defined. 


In addition to reassuring us that Definition 4.5.5 is unambiguous, the proof above also shows that 
if A is any n x n matrix and E is any n x n elementary matrix, then we have 


det(EZA) = det(E) det(A). 


In fact, it is always the case that det(AB) = det(A) det(B). We present this result as a theorem. 


Theorem 4.5.11 
Let A and B ben x n matrices. Then det(AB) = det(A) det(B). 


Similar to the proof of Theorem 4.5.10, the proof uses elementary matrices corresponding to A and 
B and also considers cases depending on the reduced echelon form of A and B. 


Proof. Let A’ denote the reduced echelon form of A and let B’ denote the reduced echelon form of B. 
Then we can perform row operations on A’ through multiplication by n elementary matrices to get to 
A. That is, there are elementary matrices E;,i = 1,2,..., so that 


A=E,E2...E,A' 


And, we can perform row operations on B’ through multiplication by k elementary matrices to get to 
B. So, there are elementary matrices E;, j = 1,2,...,k so that 


B= E\E)...E,B’. 


By the definition of determinant, we see that det(A) = det(E)) det(E2)...det(E,) det(A’) and 
det(B) = det(E)) det(E)...det(E,,) det(B’). Now, we consider cases (1) A’ and B’ are both J and 
(2) at least one of A’ or B’ is not J. 


(1) In the case where both A’ = J and B’ = I, we have 
AB =(E,E>.. .E,A')(E\ Eo ... Ex B’) = E\E2 ... En E\ Eo ie Ex. 


This means that AB reduces to J. Again, by the definition of the determinant, 


det(AB) = (det(E}) det(E>) ... det(E,,))(det(E)) det(E) ... det(E,,) det(1)) = det(A) det(B). 
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(2) Now, we consider the case where at least one of A’ or B’ is not the identity matrix. Suppose, first 
that B’ £ I. Then we see that AB reduces to A’ B’ through the matrix multiplication 


(E\ E>... E, A')(E\ Eo ae E,B’) = E\E... E, A’ E\ E> can E,B’. 
Therefore, by the definition of the determinant, 
det(A B)=(det(E}) det(E2)... det(E,))(det(E}) det(E2)... det(E,) det(B’)) = det(A) det(B). 


In fact, det(B’) = 0 because it has a row of all zeros. Similar investigations lead to det(A B) = 0 
for the case when A’ is not the identity. (See Exercise 23.) 


We now present another method for computing determinants. This method is called cofactor 
expansion. 


Lemma 4.5.12 
The determinant of a | x 1 matrix M is just the value of its entry. That is, if M = [a], then 
det(M) =a. 


The proof immediately follows from the fact that M is already in echelon form. 
We use the determinant of a 1 x 1 matrix to find det(M) = |M| for any n x n matrix M using the 
following iterative computation. 


Theorem 4.5.13 
Let A = (qj,;) be ann x n matrix, n > 2. 


e (Expansion along ith row) For any i so that 1 <i <n, 


n 
|A| = Yoni a;, j|Mi. jl. 
j=l 


e (Expansion along jth column) For any j so that 1 < j <n, 
n 

IA] = )0(-D!*/ai,j|Mijl- 
— 


Here, M;,; is the sub-matrix of A where the ith row and jth column have been removed. 
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We omit the proof, but encourage the curious reader to consider how you might prove this theorem.* 


Let us use the cofactor expansion along the first row to compute the determinant of a 2 x 2 matrix. 


Example 4.5.14 Let A = (ee oa then 
a2,1 42,2 


[A] = ay,1|M1,11 — a1,2|M1,2I. 


But M11 = [a2,2] and M12 = [a2,1] so we have |A| = a 1,142.2 — a1,2a2,1. Happily, we have recovered 
the familiar formula for the determinant of a 2 x 2 matrix. 


If n is larger than 2, this process is iterative — the determinant of each submatrix M;,; is determined 
by another cofactor expansion. This process is repeated until the sub-matrices are 2 x 2. 

We begin with a computation for a general 3 x 3 and then a general 4 x 4 matrix. We will follow 
these computations with the an example computing the determinant of a specific matrix. 


41,1 41,2 41,3 
Example 4.5.15 First, we consider an example when n = 3. Let A = | a2.) 2,2 a2,3 |, then 


43,1 43,2 43,3 


2,2 42,3 
3,2 43,3 


42,1 42,3 
3,1 43,3 


42,1 42,2 
43,1 43,2 


|A| =a11 


> 


, ow 


From here, we use the formula for the determinant of a 2 x 2 matrix. Thus 


|A| = a1,1(2,243,3 — 43,242,3) — 41,2(42,143,3 — 42,303,1) + 41,3(a2,143,2 — 2,243.1). 


We continue by finding the determinant of a general 4 x 4 matrix. 


1,1 41,2 41,3 41,4 


a2,1 42,2 d2,3 a . ‘ 
Example 4.5.16 Let A = eens , then, expanding about the first row gives 
43,1 43,2 43,3 43,4 


4,1 44,2 44,3 44,4 


2,2 42,3 42,4 2,1 42,3 42,4 
|A| =a1,1 | 43,2 43,3 43,4} — 41,2 | 43,1 43,3 43,4 
44,2 44,3 44,4 4,1 44,3 44,4 

2,1 42,2 a2,4 2,1 42,2 a2,3 

+ 41,3} 43,1 43,2 43,4] — 41,4] 43,1 43,2 43,3 

4,1 44,2 44,4 4,1 44,2 44,3 


From here, we finish by employing a cofactor expansion formula for the determinant of a3 x 3 matrix 
and then the formula for a 2 x 2 matrix (as in Example 4.5.15 and Example 4.5.14 above). 


4 Hint: Can you show that a row expansion satisfies the properties of Definition 4.5.5? What about a column expansion? 
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In the previous two examples, we expanded about the first row. If it better suits us, we can expand 
about any other row or any column, and we will arrive at the same number. We demonstrate with the 
following specific example. 


Example 4.5.17 Compute the determinant of the given matrix. 


55.2 
A=|{ 10 1 
=02 


First, as before, we can expand along the first row. Then 


1 11 1 
1=2]>4|-2[-2 -4[ +2] 29 
=2(0(—4) — 1(2)) — 201(—4) — 1(-2)) + 20.2) — 0(-2)) 
=-44444 
=4. 


We will now perform a cofactor expansion along the second column. 


1 22 22 
al=—2|_ 9 a]+o] 5 3]-2/77) 
= — 2(1(-4) — 1(-2)) + 0 — 2(2(1) — 2(1)) 
=—2(-4- (-2)) 
if 


Both calculations yield the same result (indeed, expanding around any row or column must give the 
same result). However, the second calculation is a little bit simpler because we only needed to compute 
2 of the 3 subdeterminants. We often choose the row/column to perform the expansion to be the one 
containing the most zeros. 


The following example further illustrates the idea of choosing a convenient row or column about 
which to expand. 


Example 4.5.18 Let A be the matrix 


15-3 9 
10 1 7 
ae Ol 0 0 
22. 9 =2 


Compute the det(A). 


Here we notice all the zeros in the third row and choose to expand about row 3, giving us one term 
in the first step, requiring a determinant of one 3 x 3 matrix. 
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1-3 9 
AlSailt ib. 
3. 52 

1s 3 9 —39 

=—1(f5 3]-1°3 2/43] Tal) 


= 1(1 (2) = 55 1 — 103) — 5 9) $99) 7 = 9 +1) 
= — (—37 — (—39) + 3(—30)) = 88. 


Such choices, as we made here, can simplify the calculations. There are many choices and some might 
seem easier. We encourage the reader to attempt other choices of columns or rows about which to 
expand in order to become comfortable choosing a less tedious path. 


Cofactor expansion gives us an easy way to prove the following useful property of the determinant. 


Theorem 4.5.19 
Given ann x n matrix, A, det(A’) = det(A). 


The proof is Exercise 29. 

Our primary motivation in this section has been to explore determinants as a means of understanding 
the geometry of linear transformations. However, as some of the proofs here suggest, determinants 
are intimately connected with other linear algebraic concepts, such as span. We will develop these 
connections more in future chapters, but to whet your appetite, we close this section with a theorem 
relating the determinant to the set of solutions to the matrix equation Ax = b. 


Theorem 4.5.20 
Let A be ann x n matrix. The matrix equation Ax = O has one solution if and only if det(A) 4 0. 


The proof is Exercise 35. 


Exercises 


For Exercises 1-4, draw the image of the unit square under the transformation T (x) = Mx and calculate 
the area multiplier determined by the transformation T. 


20 
1 w= (2°) 
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For Exercises 5-9, draw the image of the unit cube under the (3D) transformation defined by T (x) = Mx 
and calculate the volume multiplier determined by the transformation T. 


200 
5. M=|050 
003 
—-100 
6. M=] 0 10 
001 
111 
7.M= {4221 
110 
111 
8 M= [222 
3:33 
120 
9. M={010 
001 


Find the determinant of the matrices in Exercises 10-22. 


(1,3) 
(1) 
i. 22) 
16 ('9) 


5 1 
Te. a 3 
1 11 
19. ( 1-11 
-1-11 
5 55 
20. (5 —55 
—5—-55 
1 1 -—2 
21. {2-1 2 
2 1-4 
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22: 


23. 


24. 


25. 


26. 
27. 


28. 


29. 
30. 


31. 


32. 


33. 


4 Linear Transformations 


1 2 2 
1-1 1 
—2 2 -4 
Complete the proof of Theorem 4.5.11 by showing that if the reduced echelon form of a matrix 
A has a row of all zeros, then the reduce echelon form of AB also has a row of all zeros for any 
matrix B. How does this complete the proof? 
Using the method following Definition 4.5.5, show that if 


ab 
=) 
then det(wA) = a? det(A). 
Using the method following Definition 4.5.5, show that if 


ay 412 443 
A= | az) a22 493 
431 432 433 


then det(wA) = a? det(A). 
Given ann x n matrix A, show that det(aA) = a” det A. 
Using the method following Definition 4.5.5, show that if 


ab 
s=(t4) 
then det(A) = det(A'). 
Using the method following Definition 4.5.5, show that if 


ay 412 443 
A= | az) a22 493 
a3) 432 433 


then det(A) = det(A'). 

Prove Theorem 4.5.19: given ann x n matrix A, show that det(A) = det Al. 

Let A be ann x n matrix, and let B be the matrix obtained by multiplying the kth column of A 
by a factor w. Give both a geometric explanation and an algebraic explanation for the equation 
det(B) = a det(A). 

Using the method following Definition 4.5.5, show that 


ab 
det c ‘) = ad — be. 


(We calculated this determinant using cofactor expansion in the text, so you are being asked to use 
the row reduction method for this calculation.) 

Prove Theorem 4.5.9: the determinant of an n x n matrix A is zero if and only if the reduced 
echelon form of A has a row of all zeros. 

Suppose an n xX n matrix A has determinant zero. Geometrically, we conclude that the map T, 
maps the unit cube to a degenerate parallelepiped; that is, a parallelepiped of zero volume. 
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(a) What can you conclude about the span of the set {Ae1, Ae2,..., Aen}? 
(b) What can you conclude about the linear independence or dependence of the set {Ae ;, Aeo,..., 
Aen}? 


[Hint: it may be helpful to look at an example, such as Example 4.5.8.] 


34. True or False: If A and B are two n x n matrices then det(A + B) = det(A) + det(B). If true, 
give a proof, and if false, give an example showing that it is false. 

35. Prove Theorem 4.5.20: Given ann x n matrix A, the matrix equation Ax = 0 has one solution if 
and only if det(A) 4 0. 


4.6 Explorations: Re-Evaluating Our Tomographic Goal 


In Section 4.1 we learned how to construct radiographs from objects given a known radiographic 
scenario. This first example of a transformation is a function which takes object vectors (in the object 
vector space) and outputs radiograph vectors (in the radiograph vector space). We observed that these 
types of transformations have the important property that linear combinations are preserved: T (ax + 
By) =aT(x) + BT(y). In Section 4.2 we formalized this idea by defining linear transformation 
between vector spaces. This linearity property is an important tool in analyzing radiographs and 
understanding the objects of which they are images. In this section, we continue our examination 
of radiographic transformations. 


4.6.1 Seeking Tomographic Transformations 


It is quite natural, but unfortunately incorrect, to assume that once the action of a transformation is 
known that we can follow this action backwards to find the object that produced a given radiograph. 
Here we begin to explore why this backward action is not so straightforward. 

For any radiographic scenario, we have a transformation T: V — W, where V is the vector space of 
objects and W is the vector space of radiographs. Our ultimate goal is to find a transformation S : W > 
V which “undoes” the action of T, that is, we seek S such that S o T is the identity transformation. 
This idea is illustrated in the following table. Radiography is the process of obtaining a radiograph b. 
Tomography is the process which seeks to reverse the process by finding an object x which provides 
radiograph b. 


Process Given Action Result 


Radiography T 


(47; i “MG; E %, 
ia 
Tomography ! { 4 S Ly) 
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Notice that (S o T)(x) = S(T(x)) = S(b) = x; thus, S o T is the identity transformation J: V > 
V. We do not yet have any guarantee that such a transformation S exists. We can only explore the 
possibility of S by carefully understanding T. Two key questions rise to the top. 


1.Could two different objects produce the same radiograph? 
If so, then S may have no way to determine which object is the correct object. 


2.1s it possible to have a radiograph which could not possibly come from any object? 
If so, then S may be unable to point to any reasonable object at all. 


In our explorations, we will make use of the following definitions. 


Definition 4.6.1 


Consider a vector space (V,+,-) of objects to be radiographed and a vector space (W, +, -) of 
radiographs. Suppose there exists a radiographic transformation T: V — W. Then, 


NNMNPWN KE 


4.6.2 


The zero object is the object 0 € V whose voxel intensities are all zero. 

The zero radiograph is the radiograph 0 € W whose pixel intensities are all zero. 

A nonzero object is an object x € V for which at least one voxel intensity is nonzero. 

A nonzero radiograph is a radiograph b € W for which at least one pixel intensity is nonzero. 
An invisible object is an object x € V which produces the zero radiograph. 

A possible radiograph is a radiograph b € W for which there exists an object x € V such that 
b=T(x). 

We say that two vectors (objects or radiographs) are identical if all corresponding (voxel or 
pixel) intensities are equal. 

We say that two vectors (objects or radiographs) are distinct if at least one pair of corresponding 
(voxel or pixel) intensities are not equal. 


Exercises 


Consider the three radiographic scenarios shown in Figures 4.24-4.26 which were previously examined 
in Section 4.1. Exercises | through 10 can be applied to one or more of these scenarios. 


Height and width of the image in voxels: n = 2 (Total 


voxels N = 4) 
ry r3 by 
e Pixels per view in the radiograph: m = 2 
Ly La bs e ScaleFac = 1 
e Number of views: a = 2 
a a a e Angle of the views: 0; = 0°, 02 = 90° 


Fig.4.24 Tomographic Scenario A. Objects are in the vector space of 2 x 2 grayscale images. Radiographs are in the 
vector space of 2 views each with 2 pixels and the geometry as shown. 
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Height and width of the image in voxels: n = 2 (Total 
voxels N = 4) 


Pixels per view in the radiograph: m = 2 


ScaleFac = /2 


Number of views: a = 2 


Angle of the views: 0, = 45°, 02 = 135° 


Fig.4.25 Tomographic Scenario B. Objects are in the vector space of 2 x 2 grayscale images. Radiographs are in the 
vector space of 2 views each with 2 pixels and the geometry as shown. 


e Height and width of the image in voxels: n = 2 (Total 
voxels N = 4) 


Pixels per view in the radiograph: m = 4 


ScaleFac = V2/2 


Number of views: a = 1 


Angle of the views: 0, = 45° 


Fig.4.26 Tomographic Scenario C. Objects are in the vector space of 2 x 2 grayscale images. Radiographs are in the 
vector space of | view with 4 pixels and the geometry as shown. 


Carefully and completely, using linear algebra language, answer the questions. Be sure and 
provide examples and justifications to support your conclusions. 


1. Is it possible for distinct objects to produce identical radiographs? 

2. Are there nonzero invisible objects for this transformation? 

3. Are there radiographs that cannot be the result of the transformation of any object? In other words, 
are there radiographs which are not possible? 


The next three questions consider the deeper implications of your previous conclusions. Be 
creative and use accurate linear algebra language. 


4. If possible, choose distinct objects that produce identical radiographs and subtract them. What is 
special about the resulting object? 

5. Describe, using linear algebra concepts, the set of all invisible objects. Formulate a mathematical 
statement particular to the given transformation. 

6. Similarly, describe the set of possible radiographs. 


250 4 Linear Transformations 


The next four questions ask you to dig deeper into the structure of the vector spaces themselves 
and how they relate to the transformation. 


7. Show that the set of all invisible objects is a subspace of V. 

8. Give a basis for the subspace of all invisible objects. 

9. Show that the set of all possible radiographs is a subspace of W. 
10. Give a basis for the set of all possible radiographs. 


Additional questions. 


11. Construct a radiographic scenario, using the same object space as Scenario A, for which there are 
no invisible objects. 

12. Construct a radiographic scenario, using the same object space as Scenario A, for which every 
radiograph is a possible radiograph. 

13. How might it be possible, in a brain scan application, to obtain a radiograph b € W that could not 
possibly be the radiograph of any brain x € V? Give at least three possible reasons. 

14. Make conjectures about and discuss the potential importance of a brain object x € V that contains 
negative intensities. 

15. Discuss the potential importance of knowing which objects are in the subspace of objects invisible 
to brain scan radiography. 


4.7 Properties of Linear Transformations 


In Section 4.6, we saw that certain properties of linear transformations are crucial to understanding our 
ability to perform a tomographic process. In particular, we found that it is possible for (a) two distinct 
brain objects to produce identical radiographs and (b) real radiographic data not to correspond to any 
possible brain object. We want to know if it is possible for an abnormal brain to produce the same 
radiograph as a normal brain. Figure 4.27 shows two brain images which produce identical radiographs 
under the 30-view radiographic scenario described in Appendix A. We notice differences in the density 
variations (shown in darker gray in the left image) across the brain images which are invisible to the 
radiographic process. This means that if we found the difference in these brain images, the difference 
would be a nonzero invisible vector in the vector space of objects. 

In addition, we want to understand the effects that noise or other measurement artifacts have on the 
ability to accurately determine a likely brain object. In these cases, we want to be able to recover the 
same brain object as if noise was never present. In later discussions, we will understand why noise can 
present a challenge (but not an insurmountable challenge) for recovering a meaningful brain image. 

To understand a linear process between two vector spaces, we need to thoroughly understand these 
properties of the transformation. 


4.7.1. One-To-One Transformations 


Again, in Figure 4.27, we see that the radiographic transformation can map two distinct objects (domain 
vectors) to the same radiograph (codomain vector). It is important to recognize whether a transformation 
does this. If not and we have an output from the transformation, we can track the output vector back to 
its input vector. This would be ideal when trying to reconstruct brain images. Let us, now, talk about 
these ideal transformations. 
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Fig.4.27 Two different brain images that produce the same radiograph in a particular 30-view radiographic geometry. 


Definition 4.7.1 


Let V and W be vector spaces and u, v € V. If, for a transformation T: V > W, 


T (u) = T(v) implies that u = v, 


then we say that T is one-to-one or that 7 is a one-to-one transformation. 


In Definition 4.7.1, we do not require one-to-one transformations to be linear. A one-to-one 
transformation guarantees that distinct codomain vectors “came from” distinct domain vectors. For 
example, a one-to-one radiographic transformation guarantees that a given radiograph corresponds to 
one, and only one, brain object. 

Consider the schematic pictures in Figure 4.28. The left schematic illustrates the action of a 
transformation, 7, from a set with six elements, V = {v1, v2, v3, v4, U5, V6}, to a Set with seven 


(wu) 
Ww) 
(us) 
wy) 
@s) 
(ws) 
(wy) 


Fig.4.28 Schematic illustrations of a one-to-one transformation (left) and a transformation that is not one-to-one (right). 


252 4 Linear Transformations 


elements, W = {w1, w2, w3, w4, W5, Wo, wW7}. For example, T(v;) = w2. This transformation is one- 
to-one because if T (vg) = w; and T(v;) = w;, then vg = v; forall w;. Thatis, there is, at most, only one 
arrow pointing to each of the w;. The right schematic illustrates the action of a different transformation 
T on the same sets. In this case, the transformation is not one-to-one because T(v1) = = T (va) with 
v, # v4. This is visually seen by two arrows pointing to wa. 


Example 4.7.2 Consider the radiographic transformation of Scenario A, call it T4, described in 
Figure 4.24. The radiograph defined by b1 = b2 = b3 = ba = | corresponds to at least two objects: 
for example, 


1 

1 0 I 

Ta = iy 
0 1 i 

Va | 1/2 ; 

T, I 
1p | Ip i 


Thus, this transformation is not one-to-one. 


Example 4.7.3 Let T : R — R be defined by T(x) = 5x. Every vector, y, in the range of T, 
corresponds to a unique domain vector x = y/5. Thatis, if T (x1) = T (x2) then 5x, = 5x2.Sox, = x2. 
Thus, 7 is one-to-one. 


Example 4.7.4 Let T : R* — R? be defined by T(x) = Ax, for the following matrix A. 


10 
A=|12 
11 


This transformation is one-to-one. To see this, suppose that there are two vectors u = ( ') and 
u2 


v= @) such that Au = Av. Then we have 


v2 
uy V1 
Au = | uy +2u2 |] = Av= | vy + 202 
uy + uz vy + v2 
But then we must have that 
uy = Uj, 
uy + 2u2 = vj + 202, 
and 


uy +u2 =v, + V2. 


So we conclude that uw; = vy and u2 = v2. This means that u = v and so T is one-to-one. 
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Example 4.7.5 Let T : R* — R? be defined by T(x) = Ax, for the following matrix A. 


This transformation is not one-to-one. We might try the same method as in the previous example. 
Suppose that there are two vectors 


such that Au = Av. Then we have 


uy, — 2u2 vy — 2v2 
0 = 0 
—u, + 2u2 —vy +202 


This means that uw, u2, v1}, and wz must satisfy 


uy —2u2 = V1 —2v2 
and 


—uy + 2u2 = —v1, + 202. 


But these two equations are multiples of each other, and both say that uj — 2u2 = vy — 2v2. Therefore, 


we cannot conclude u = v. Indeed, for u = 3) and v = (7). Au = Av and we see that T is not 


1 
one-to-one. Many other vectors u and v could act as examples to show that T is not one-to-one. 


Example 4.7.6 Let T : P2(R) > M2 3(R) be defined by 


2 _ [abe 
T (ax +bx+0= (555): 


Notice that T is a linear transformation. Indeed, let v = ax? + ayx + ag and u = box? + byx + bo, 
for coefficients az, by € R, k = 0, 1, 2. Then for a scalar a € R, 


T(av +u) =T ((ea + bo) x + (aa, + b1)x + (aag + bo)) 


0 0 0 
_ (maz ada, aag £ bo by bo 
~\0O0 0 0 0 0 0 


= Oo ae bz by bo 
a 0 0 0 0 0 0 


=aT(v)+T(u). 


= Gee aa,+b, “—) 
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Therefore, T is a linear transformation. Next, we show that T is one-to-one. Suppose T(v) = T(u). 
We will show that u = v. Notice that T(v) = T (uw) means 


a2 a, ao\ _ (bo by bo 
00 0/ \oO 0 O/}* 


a2 = b2, ay = by, and ag = bo 


Thus, 


Thus, v = wu. 


Example 4.7.7 Let T : R* — P(R) be defined by 


r(f) =ar +0. 


We want to determine whether or not T is one-to-one. Suppose 
a c 
T =T ; 


ax+b=cx+4+d. 


Then 


Matching up like terms gives us that a = c and b = d. That is 


So, T is one-to-one. 


Example 4.7.8 Let T : 712(R) — R (the transformation from the space of histograms with 12 bins 
to the set of real numbers) be defined as: T(J) is the sum of the values assigned to all bins. T is not 
one-to-one. We can understand this idea because simply knowing the sum of the values does not allow 
us to uniquely describe the histogram that produced the sum. 


Example 4.7.9 The identity transformation, J : V — V, is one-to-one. Indeed, for any v € V, the 
only vector u € V, for which [(u) = v, isu = v. 


Example 4.7.10 The zero transformation, 0 : V — W may or may not be one-to-one. See Exercise 40 
for more discussion. 


4.7.2 Properties of One-To-One Linear Transformations 


In this section, we will explore connections between one-to-one linear transformations and linear 
independence. In fact, we show linear independence is preserved by linear transformations that are 
one-to-one. But first, we introduce some notation. 
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Definition 4.7.11 


Let V and W be vector spaces, let S C V, and let T : V — W be a transformation. We define the 
set 


T(S) ={T(s)|s ES} CW. 


That is, TS) is the set of all vectors being mapped to by elements of S. We call this set the image 
of S under 7. 


Example 4.7.12 Consider the linear tranformation T : P2(R) — R? defined by 
T (ax* + bx +c) = (c,b,a +b). 


Let S = {1, x, x*} C Po(R). We have 


T(S) = {ra T(x), T(x) = {(1, 0, 0), (0, 1, 1), (0,0, 1}. 


Example 4.7.13 Suppose S = 4 Then T(S) is the set of all vectors in W mapped to by vectors in S. 
Since, there are no vectors in S, there can be no vectors in T(S). That is, 7(S) = @. 


Example 4.7.14 Consider the vector space V of 7-bar LCD characters. Let T : V — V be the one- 


to-one linear transformation which is described by a reflection about the horizontal axis of symmetry. 
Let 


- «=» = 
S= _! ; l | then T(S) = _! ; f 
Vt ri 


The next Lemma is a statement about how nested sets behave under linear transformations. This 
will be useful as we work toward our goal of understanding how linearly independent sets behave 
under one-to-one transformations. 


Lemma 4.7.15 
Let V and W be vector spaces, and let T : V — W be a transformation. Suppose S$; and S» are 
subsets of V such that $; C S2 C V. Then T(S,) € T(S2). 


Proof. Suppose V and W are vector spaces and T is defined as above. Suppose also that S; C $2 C V. 
We will show that if w € T(S)) then w € T(S2). Suppose w € T(S,). Then there exists v € S$; such 
that T(v) = w. Since S; C $2, we know that v € S» and, therefore, T(v) € T(S2). So w € T(Sp). 
Thus, 7(S1) C T(S). 


Now, we have the tools to verify that linear independence is preserved under one-to-one transformations. 
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Lemma 4.7.16 

Let V and W be vector spaces. Let S = {vj,v2,..., Um} be a linearly independent set 
in V, possibly empty. If T : V — W is a one-to-one linear transformation, then T(S) = 
{T (v1), T (v2), ..., T(Um)} is linearly independent. 


Proof. Let V and W be vector spaces. Suppose T : V — W is a one-to-one linear transformation. 
First, note that if S = 4 then T(S) = % which is linearly independent. 

Now, assume that S is not empty and that S is linearly independent. We want to show that 
{T (v1), T(v2),..., T(Um)} is linearly independent. Let a1, @2,..., @m be scalars and consider the 
linear dependence relation 


ayT (vy) + a2T (v2) +... + AmT (Um) = 0. (4.6) 


We wish to show that aj] = a2 =... = a, = 0. Because T is linear, we can rewrite the left side of 
(4.6) to get 
T(ayvy +a2v2 +... + Q,Um) = 0. 
But, since T(0) = 0, we can rewrite the right side of (4.6) to get 
T(ayv, +aguv2 +... +AmUm) = T(0). 
Now, since JT is one-to-one, we know that 


jv, + a2V2 +... + AnUm = 0. 


Finally, since S is linearly independent 


aj =a2=...=Ay, = 0. 


Thus, {7 (v1), T (v2), ..., T(vm)} is linearly independent. 


Example 4.7.17 In Example 4.7.12 the transformation T is one-to-one because 
T (ax? + bx +c) = T (dx? + ex + f) implies that (c, b,a +b) = (f,e,d +e). 
Therefore, it must be true thatc = f,b = e, and, so, a = d. Lemma 4.7.16 guarantees that the linearly 


independent set 1, Xs art C P2(R) is mapped to a linearly independent set, namely, {(1, 0,0) , 
(0, 1, 1), (0, 0, 1)} € R?. 


Since linearly independent sets in V form bases for subspaces of V, Lemma 4.7.16 along with 
Exercise 42 from Section 4.2 lead us to question how this affects the dimensions of these subspaces. 
The following lemma answers this question. 
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Lemma 4.7.18 
Let V and W be vector spaces. Let U C V be a subspace of dimension m. If T : V > W isa 
one-to-one linear transformation, then T(U) € W is also a subspace of dimension m. 


Proof. Let V and W be vector spaces with U C V, an m-dimensional subspace of V. Let S = 
{U1, U2,.-., Um} be a basis for U and suppose T : V — W is a one-to-one linear transformation. We 
will show that T(S) is a basis for the subspace T (U). That is, we need T (S) to be linearly independent 
and to span T(U). By Lemma 4.7.16, we already know T (S) is linearly independent. So, we need only 
show that T7(S) spans T(U). 


Let w € T(U). Then, by Definition 4.7.11, there is au € U so that T(u) = w. Since u € U, there 
are scalars a1, @2,..., @» So that 


uU= Vj +a2V2 +... + AmVm. 
So, by linearity of T 


w = T (av, +0202 +... + AnVm) 
= a,T (vj) +a2T (v2) +... + AmT (Um). 


Thus, w € Span7(S). Therefore, T(U) C Span{7 (v1), T(v2),..., T(Um)} = Span T(S). Since 
T(S) C T(U), Span(T(S)) = T(U). Thus, T(S) is a basis for T(U). 


The close connection between one-to-one transformations and linear independence leads to the 
following important theorem. 


Theorem 4.7.19 

Let V and W be finite-dimensional vector spaces and let B = {vj,v2,..., Un} be a 
basis for V. The linear transformation T : V — W is one-to-one if and only if T(6) = 
{T (v1), T (v2), ..., T(vn)} is a linearly independent set in W. 


Proof. Let V and W be finite-dimensional vector spaces and let B be a basis for V. (=) Linear 
independence of T (6) follows directly from Lemma 4.7.16. 

(<=) Now suppose that {7 (v1), T(v2),..., T(vn)} C W is linearly independent. We want to show 
that T is one-to-one. Letu, v € V so that T(u) = T(v). Then, T(u) — T(v) = 0. Thus, T(u — v) = 0. 
Since u,v € V, there are scalars a1, @2,..., @, and f1, B2,..., By so that 


uU=aQ)v; ta2V2 +... +a,v0, and v = Biv, + Bov2 +... + Brin. 


Thus, 
T ((ay — Bi )vy + (a2 — B2)v2 +... + (Qn — Bn) Un) = 0. 
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This leads us to the linear dependence relation 

(a1 — Bi)T (v1) + (@2 — B2)T (v2) +... + Gn — Bn)T (Un) = 0. 
Since {T (v1), T (v2), ..., T(v,)} is linearly independent, we know that 


a, — Bj =a2— Po =... = ay — By = O. 


That is, uv = v. Therefore, T is one-to-one. 


Example 4.7.20 Consider 6 = { 1,40" } , the standard basis for P3(IR). Let T : P3(R) > P3(R) 
be defined by T(f(x)) = f’(x). We have T(B) = {0, 1, 2x, 3x") which is linearly dependent. Thus, 
T is not one-to-one. 


Example 4.7.21 Consider Example 4.7.6. Using the basis 


B= fitxtx,14¢x1] 


111\ (011) (001 
FE | Gun) arno) (050) 


Since T (B) is linearly independent, T is one-to-one. 


for P2(IR) we have 


Now, for vector spaces V and W, let us discuss the connection between one-to-one transformations, 
T : V > W and the dimensions of V and W. 


Theorem 4.7.22 
Let V and W be finite-dimensional vector spaces. If T : V > W is a one-to-one linear 
transformation, then dim V < dim W. 


Proof. Let V and W be vector spaces with bases By and By, respectively. Suppose also that dim V = n 
and dim W = m. Now, let T : V > W bea one-to-one linear transformation. By Theorem 4.7.19, we 
know that since By is a basis for V, By has n elements and T(6y) is a linearly independent set 
in W with n elements. Using Corollary 3.4.22, we know that Bw has m elements and any linearly 
independent set in W must have at most m elements. Therefore, n < m. So, dim V < dim W. 


The reader should revisit each example of a one-to-one linear transformations in this section and 
verify the conclusion of Theorem 4.7.22. That is, you should check that the dimension of the codomain 
in each of these examples is at least as large as the dimension of the domain. 


4.7.3. Onto Linear Transformations 


Next, we consider whether a transformation has the ability to map to every codomain vector. In the 
radiograph sense, this would mean that every radiograph is possible (see Definition 4.6.1). 


4.7 Properties of Linear Transformations 259 


Fig.4.29 Schematic illustrations of two different onto transformations. 


Definition 4.7.23 


Let V and W be vector spaces. We say that the transformation T : V — W maps onto W if for 
every vector w € W there exists v € V so that T(v) = w. 
In the case that a transformation T maps onto its codomain, we say that T is an onto transformation. 


Definition 4.7.23 does not require an onto transformation to be linear. 

Consider the schematic pictures in Figure 4.29. The left schematic illustrates the action of a 
transformation, T, from a vector space of six vectors, V = {v1,--- , v6}, to a vector space of five 
vectors, W = {uw 1,--- , ws}. For example, T(v;) = w2. This transformation is onto because every 
wx can be written as wy = 7 (v;) for some v;. The right schematic illustrates the action of a 
different transformation on the same vector spaces. This transformation is also onto. But, neither 
transformation is one-to-one. If we consider the schematic pictures in Figure 4.28, we can see that 
neither transformation is onto because, in the left schematic, there are no vectors in the domain that 
map to ws in the codomain and, in the right schematic, we see that the equations 


T(x) = ws and T(y) = we 


have no solutions. 
Let us consider some examples in which we determine whether a transformation is onto. 


Example 4.7.24 Consider the radiographic transformation of Scenario A described in Figure 4.24 and 
discussed in Example 4.7.2. There is no possible object which can produce the radiograph defined by 
by = by = b3 = 0, bg = 1. Therefore, this transformation is not onto. 


In the schematics above, we see pictures of possible scenarios where a transformation may be one- 
to-one and not onto or one that may be onto, but not one-to-one. Let us consider actual examples of 
these cases. 


Example 4.7.25 Let T : P2(IR) — R be defined by T (ax? + bx +c) =a+b-+c. We know that T 
is linear (See Exercise 5). Let u = x* and v = x, then u, v € P2(R) and u ¥ v, but T(u) = T(v). 
Therefore, T is not one-to-one. 

Now, for any w € R we know that wx? € P2(R) and T(wx*) = w. Therefore, T is onto. 
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Now, we know that there is a linear transformation that is onto, but not one-to-one. 


Example 4.7.26 Let T : R? > M3,2(R) be defined by 

a —b 
T 6. =|ba+b 
0 


—a 


We want to know if T is one-to-one and/or onto. First, we determine whether T is one-to-one. Let 
u = (a,b)! and v = (c,d)! and suppose that T(u) = T(v): 


r(:)=7 (7): 


then 
a —b c —d 
bat+b]=|dc+d 
0 -a 0 -c 


Matching up entries, gives us a=c,—b = —d,b=d,a+b=c+d,0=0, and —c = —d, with 
unique solution a = c and b = d. Thus, u = v and T is one-to-one. 
Next, we determine whether T maps onto M3,,.2. Notice that, 


00 
w={100] € M3x2(R). 
10 


But, there is no v € R? so that T(v) = w because no vector in the range of T has a nonzero number 
as the lower left entry. Thus, 7 is not onto. 


We now consider examples we saw in Section 4.7.1. 


Example 4.7.27 As in Example 4.7.7, let T : R* —> P\(R) be defined by 


r({) =ar +0. 


If we pick w € P, then w = ax + b for some a, b € R. And, if we let 


_ {4 2 
v=(f)er 


then T(v) = w. Thus, T is onto. 


Example 4.7.28 The identity transformation J : V — V maps onto V because for every vector y € V 
there exists a vector x € V such that (x) = y, namely, x = y. 


Example 4.7.29 The zero transformation may or may not be onto. See Exercise 41. 
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4.7.4 Properties of Onto Linear Transformations 


In this section, we explore linear transformations T : V — W that map onto W and what we know 
about the spaces V and W. We found that if T is a one-to-one linear transformation then the dimension 
of V is at most the dimension of W. We will consider a similar theorem about onto transformations 
here. 


Lemma 4.7.30 
Let V and W be finite-dimensional vector spaces, T : V — W be an onto linear transformation, 
and S = {w,, w2,..., Wm} C W. If Sis linearly independent, then there is a linearly independent 


set U = {v1, v2,..., Um} C V such that T(U) = S. 


Proof. Suppose S C W is linearly independent and T : V > W is linear and onto. Since T maps 
onto W, there exists U = {v1, v2,..., Um} C V so that T(U) = S. We will show that U is linearly 
independent. Let a1, a2, ..., @m be scalars so that 


Vy $QQV2 +... + AyVm = O. 


Then, by linearity, 
T (a, v1 +0202 +...QmUm) = 0. 


That is, 
oT (v1) + a2T (v2) + ...nmT (Um) = 0. 


But since T(U) = S, we have 
yw, +A2QW2 +...AnWm = 0. 


Finally, since S is linearly independent, a] = a2 = ... = a, = 0. Thus, U is linearly independent. 


Example 4.7.31 Consider the linear transformation T : R* —> R? defined by T(x, y, z) = (x, y).T is 
onto because any vector (x, y) € R* can be mapped to by some vector in R?. For example (x, y, 4) 
(x, y) for all x, y € R. Let S = {(1, 0), C1, 5)}. Since S is linearly independent, Lemma 4.7.30 tells 
us that there exists a linearly independent set U C V such that T(U) = S. One possibility is U = 
{(1, 0, 17), (1, 5, 0)}. 


Unlike Lemma 4.7.16, we do not claim that the converse is true (see Exercise 46). Lemma 4.7.30 
tells us that a linearly independent set in V maps to a basis of W through an onto linear transformation. 
This is more formally written in the following corollary. 


Corollary 4.7.32 

Let V and W be finite-dimensional vector spaces and let T : V — W bea linear transformation 
that maps onto W. If By is a basis for W then there exists a linearly independent set U C V so 
that T(U) = By. 
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Proof. Let V and W be vector spaces with dim V = n and dim W = m and let T : V — W be an 
onto linear transformation. Suppose By is a basis for W. We will find a set of vectors in V that map 
to By and is linearly independent. Notice that if m = 0, then By = 4 and U = J C V isa linearly 
independent set so that T(U) = By. Now, suppose m > 0. 
Assume that 
By = {wy, W2,..-Wm} 


and consider the set 
U={veEV|T(v) € By}. 


We know that, because T maps onto w, there are vectors v1, v2,..., Um so that T(vj) = w;, i = 
1,2, ...,m. That is, 
U = {v1, v2,..., Um}. 
Let a1, @2,...,@, be scalars and consider the linear dependence relation 
ayvy +a2V2 +... + AmUm = 0. (4.7) 
We will show that a] = a2 = ... = A». We know that because T is a function, (4.7) gives us that 


T (avy +202 +... +QmUm) = T (0). 
Therefore, because T is linear, we have 
a1T (v1) +a2T (v2) +... + AmT (Um) = 0. 
But, by the definition of U, we have 
aj,wW, +A2W2+...+AnWm = 0. 


But, since By is a basis, and therefore linearly independent, we know that 


a) =2=...=Ay, = 0. 


Therefore, U is linearly independent. 


Using Corollary 4.7.32, we can discuss how the dimensions of V and W are related. 


Theorem 4.7.33 
Let V and W be finite-dimensional vector spaces. Let T : V — W be a linear transformation 
that maps onto W. Then dim V > dim W. 


Proof. Let T : V — W be onto and linear. Also, let dim V = n and dim W = m. We know that if By 
is a basis of W then there are m elements in By. By Corollary 4.7.32, there is a linearly independent 
subset S C V with m elements. By Corollary 3.4.22, we know that m < n. Thus, dim W < dim V. 
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4.7.5 Summary of Properties 


At this point, we have collected many properties that link linear transformation that are one-to-one and 
onto to the domain and codomain spaces. We summarize them here. 


> Summary of Transformation Properties. Suppose V and W are finite-dimensional vector 
spaces and that T : V — W isa linear transformation. We know that 


1. If T preserves linear independence, then T is one-to-one. 
2. If T maps some subset of V to a spanning set of W, then T is onto. (Exercise 39) 


We know that if T is one-to-one then 


3. T preserves linear independence. 
4. T maps a basis of V to a basis of some subspace of W. 
5. dim V < dim W. 


We know that if T is onto then 


6. A basis for W is mapped to by some linearly independent set in V. 
7. dim W <dimV. 


4.7.6 Bijections and lsomorphisms 


In Sections 4.7.1 and 4.7.3, we have seen that some linear transformations are both one-to-one and onto. 
The actions of transformations with these properties are the simplest to understand. This statement 
requires evidence. So, let’s, first, consider what this means in the radiographic sense. 

A radiographic scenario, with an onto transformation, has the following properties. 


Every radiograph is a possible radiograph. 

For every radiograph there exists some brain image which can produce it through the transformation. 
If 7: V — W, then T(V) = W. 

If T : V — W and £ a basis for V, then Span T(6) = W. 


A radiographic scenario with a one-to-one transformation has the following properties. 


e Distinct brain images produce distinct radiographs. 
e Each possible radiograph is the transformation of exactly one brain image. 
e Iff: V > W, then dim 7(V) = dim V. 


Taken together, a radiographic scenario with a transformation that is both one-to-one and onto has the 
key property that every possible radiographic image can be traced back to exactly one brain image. 
Radiographic transformations with this property guarantee, for every possible radiograph, the existence 
of a unique brain image which produces the given radiograph even though we do not yet know how to 
find it. 


Definition 4.7.34 


We say that a linear transformation, T : V — W, is an isomorphism if 7 is both one-to-one and 
onto. 
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Whenever we have two vector spaces V and W with an isomorphism, 7, between them, we know 
that each vector in V corresponds directly to a vector in W. By “correspond,” we mean that the vector, 
v, in V really acts the same way in V as T(v) acts in W. Essentially, these spaces are the same. We 
formalize this idea in the following definition. 


Definition 4.7.35 


Let V and W be vector spaces and let T : V — W be an isomorphism. Then we say that V is 
isomorphic to W and we write 
Vw. 


Example 4.7.36 Combining Examples 4.7.7 and 4.7.27, we see that the transformation T : R? > 
P(R), defined by 
T (5) =ax+b 


is an isomorphism. That means that R* = P\ (IR). According to our comments above, this means that 
elements of P) (IR) “act” like their corresponding elements in R?. 


We already knew that IR? and P; (R) act similarly because the coordinate space of P} (IR) is R*. We 
can actually show that the coordinate transformation is, indeed, an isomorphism. (See Theorem 4.7.43 
below.) 


4.7.7 Properties of Isomorphic Vector Spaces 

Determining whether two finite-dimensional vector spaces are isomorphic seems to hinge on finding 
an isomorphism between them. However, reconsider the isomorphism of Example 4.7.36 giving us 
that P| (R) is isomorphic to R?, P; (R) = R?. And, dim P| (R) = dim R?. This is not a coincidence. 


We will find that the only requirement for two finite-dimensional vector spaces to be isomorphic is 
that they have the same dimension. 


Theorem 4.7.37 
Let V and W be (finite-dimensional) vector spaces. Then, V = W if and only if dim V = dim W. 


Proof. (=) Suppose that V = W. Then there exists an isomorphism T : V + W. By Theorem 4.7.22, 
we know that 

dim V < dim W. 
Now, by Theorem 4.7.33, we know that 

dim V > dim W. 


Therefore, we conclude that 


4.7 Properties of Linear Transformations 265 
dim V = dim W. 
(<=) Now, suppose that dim V = dim W = n. Suppose also that a basis for V is By = {v1, v2,..., 


vy} and a basis for W is Bw = {w 1, w2,..., Wn}. By Theorem 4.2.26, we can define T : V > W to 
be the linear transformation so that 


T (vy) = uy, T(v2) = w2,..., T(vn) = Wn. 
We will show that T is an isomorphism. We know that if w € W, then 
W=ajWw, +a2W2 +... + An,Wn 
for some scalars a1, @2,..., @,. We also know that there is a vector uv € V, defined by 
V=Q1V, $a2V2 +... + Andy, € V, 
since T is linear, we have 


T(v) = T(qyvy + a2v2 +... + ayvn) 
=a T (vj) +a2T (v2) +... + an,T (vn) 
=Q,;W;, +a2W2+...+ AnWy 


= WwW. 


Therefore, T maps onto W. 
Now, suppose that 


V=aQyv, +a2v2+...+Ap,vUp and u = Biv, + Bove +... + Bnvy 
are vectors in V satisfying T(v) = T(u). We will show that v — u = 0. We know that 


0=T(v) —TW) 
= T(v—-u) 
= (av, ta2v2 +... + Anvy_ — (Bi vt + Bove +...+ Bavn)) 
= T ((a1 — Bi)u1 + (@2 — B2)v2 +... + (Gn — Bn) Un) 
= (a1 — Bi)T (v1) + (@2 — B2)T (v2) +... + (nn — Bn)T (Un) 
= (a1 — Bi)wi + (a2 — Bo)wo +... + (Qn — Bn)Wn. 


Now, since By is a basis for W, we know that 
ay — Bi, 02 — Bo, ..., An — Bn = 0. 


That is to say u — v = 0. Therefore, wu = v and T is one-to-one. Since T is both one-to-one and onto, 
T is an isomorphism and, therefore, V = W. 


Theorem 4.7.37 suggests that bases map to bases through an isomorphism as stated in the following 
theorem. 
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Theorem 4.7.38 
Let V and W be finite-dimensional vector spaces. Suppose By isa basis for V andlet T : V >~ W 
be a linear transformation. Then, T is an isomorphism if and only T (By) is also a basis of W. 


Proof. Suppose V and W are finite-dimensional bases. Suppose also that T : V + W isanisomorphism. 
We know from Theorem 4.7.37, that dim V = dim W, say that dim V =n. Now, define By = 
{v], V2,..., Un} to be a basis for V. By Theorem 3.4.24, we need only show that T (Gy) is linearly 
independent. But, Lemma 4.7.16 tells us that since By is linearly independent, so is T (By ). Therefore, 
T (By) is a basis of W. 


Corollary 4.7.39 
Let (V, +, -) be a vector space of dimension n with scalars from F. Then V is isomorphic to F”. 


This corollary is true for any scalar field F, for example: R, C, Z2, etc. This corollary suggests to us 
the idea that the most complicated abstract vector spaces (such as image spaces, heat state spaces, 7-bar 
LCD character spaces, polynomial spaces, etc.) are isomorphic to the simplest, most familiar spaces 
(such as R”). This is an indication of things to come; perhaps any vector space can be viewed as R” 
through the right lens. After all, two vector spaces are isomorphic if their elements have a one-to-one 
relationship. 

Let us take a moment to return to the discussion at the beginning of Section 4.5. In this discussion, 
we discussed matrix transformations T (x) = Mx, ona shape in IR2. We found that det(M) is the factor 
by which the area of the shape changes after transforming it by 7. But, in Exercise 18 of Section 4.2 
we considered the transformation T : R? > R2, defined as 


x 10\ (x 
T = ; 
(:) = Go) G) 
In Figure 4.30, we see that T transforms the unit square to a line segment connecting (0, 0) and (1, 0). 
In fact, no matter what shape we transform with T, we always get a line segment along the x-axis. 


Clearly, we have lost information and are not able to reconstruct the original shape. Following the same 
ideas in the previous discussion, we calculate 


10 
det (| 0) = 0. 


This matches what we saw at the beginning of Section 4.5 because the area is reduced to 0. More 
important is the information we just gained here. A matrix whose determinant is zero results in a 
transformation that is not an isomorphism. We can see this because the transformation results in an 
image with smaller dimension. That is, dim(T(R”)) <n. As we move forward, we will find this to 
be a very useful and quick check when determining whether or not we can get an exact radiographic 
reconstruction. Because this will prove to be very useful, we will put it in a box. 
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Fig.4.30 The transformation T transforms S to a line segment. 


Fact 4.7.40 

Let M be ann xn matrix and define a transformation T : R” > R” by T(x) = Mx. Then 
det(M) = 0 means that T is not an isomorphism. In fact, if det(7) 4 0, we will find that T is 
an isomorphism. 


4.7.8 Building and Recognizing lsomorphisms 


The proof of Theorem 4.7.37 suggests a tool for creating isomorphisms (if they exist). If we can define 
a linear operator T : V — W which maps a basis for V to a basis for W, then T is an isomorphism 
between V and W. 


Example 4.7.41 Let V = M2 3 and W = Ps. We know that V = W because both are 6-dimensional 
vector spaces. Indeed, a basis for V is 


3, -[ (190) (010) (001 

Y=~1\000)’\o00/’ \oo0)’ 
000\ (000\ (000 
100)’\o10)’\oo1)" 


Bw = {1,x,x7,x°, x7, x}. 


and a basis for W is 


Thus, we can create an isomorphism T that maps V to W. Using Theorem 4.7.37 we define T as 
follows 


268 4 Linear Transformations 


P\400) > * 
000\ 4 
Vaio)? 
000\_ 5 
ae res 


If we have any vector v € V, we can find to where T maps it in W. Since v € V, we know there are 
scalars, a,b, c,d,e, f so that 


100 010 001 
»=a(5 90) +*(o00) +¢(o00) 


000 000 000 
+4(150) t¢(o10) +7 (oor) 
= abc 
v= def . 


T(v) = a(1) + B(x) + c(x?) + d(x) + e(x4) + f (0) = a + bx +x? + dx? + ex4 + fx. 


That is, 


Thus, since T is linear, 


Example 4.7.42 Consider the radiographic transformation of Scenario A described in Figure 4.24. 
We have already seen that the transformation is neither one-to-one nor onto. However, the dimensions 
of the object and radiograph spaces are equal. Theorem 4.7.37 tells us that these spaces are isomorphic. 
And by Definition 4.7.35 there exists an isomorphism between the two spaces. But,the important thing 
to note is that not every linear transformation between the two spaces is an isomorphism. 


Consider Z4,.4(R), the space of 4 x 4 grayscale images. Since vectors in this vector space can be 
tedious to draw with any accuracy, let’s consider their representation in R!®. We choose to attempt this 
because these two vector spaces are isomorphic. How do we know? 

Let’s reconsider Example 3.5.7. We found that the image v, given by 
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where black indicates a i value of zero and white indicates a pixel value of 3, could be represented 
as a coordinate vector [v]g,, where 


We found, [v]g, = (3112002030321023)' eER™®. 

If we let Br = {b1, bo, --- , big} and suppose E16 = {e1, e2,--- , e146} is the standard basis for R!®, 
then we can define the linear transformation T : Z1,.4(R) > R!° as T (by) = ex, fork = 1,2,..., 16. 
That is, by Theorem 4.7.37, T is the isomorphism that maps a vector to its coordinate representation: 
ut> [v]g. 


Theorem 4.7.43 
Let V be an n-dimensional vector space with ordered basis B, and let x € V. Then the coordinate 
transformation x +> [x]g is an isomorphism from V to R”. 


Proof. From Theorem 4.2.26 we can define the transformation T by T (bx) = ex, fork = 1,2,...,7, 
where {€1, €2, +--+ , @n} is the standard basis for R”. Using the proof of Theorem 4.7.37, we know that 
T is an isomorphism. 


4.7.9 Inverse Transformations 


The exploration of transformations that are one-to-one and/or onto has led us to consider the possibility 
of recovering a domain vector from a codomain vector through understanding the properties of the given 
transformation. For example, we might wonder if we can recover a brain image from a radiograph by 
understanding the properties of the radiographic transformation. This section explores that possibility. 


Definition 4.7.44 


Let V and W be vector spaces and T : V + W. We say that T is invertible if there exists a 
transformation S : W — V, called the inverse of 7, with the property 


SoT=TIyandToS=Iy. 


270 4 Linear Transformations 


If such a transformation exists, we write 
S=T 4, 


Example 4.7.45 Consider the linear transformation T : R — R defined by T(x) = 3x + 2. The 
inverse transformation S : R — R is defined by S(y) = 5(y — 2), because 


S(T) = $Bx +2) = 5 (Gr+2)—2 =x 


and 


1 1 
rise = (50-2) =3(50-9) +2=>, 


Example 4.7.46 Consider the polynomial differentiation transformation T : P2(R) — P(R) defined 
by T(ax* + bx +c) = 2ax + b. We can see that the polynomial integration operator U : P(R) > 
P2(R), defined by U(dx +e) = Lay? 4 ex, is not the inverse transformation of 7. We see that 
U(T (ax? + bx +c)) = ax* + bx £.ax? + bx +c for allc £0. But, T(U(dx + e)) = dx +e. The 
issue here is that there are infinitely many quadratic polynomials whose derivative are all the same. 


Example 4.7.47 Consider, D(Z2), the vector space of 7-bar LCD images, and the transformation 
T : D(Z2) > D(Z2) defined by 


T(x) =ax2+ 


In this case, T~! = T because 


=o-+ bes 
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Example 4.7.48 The identity transformation J : V — V is its own inverse, J~! = I. 


It is not surprising that invertible transformations are isomorphisms. 


Theorem 4.7.49 
Let V and W be finite-dimensional vector spaces and let T : V — W bea linear transformation. 
Then T is invertible if and only if T is an isomorphism. 


Proof. Let V and W be finite-dimensional vector spaces. Suppose T : V > W is linear. (=) First, 
we show that if T is invertible, then T is an isomorphism. Suppose T is invertible. Then there exists 
an inverse transformation S. We will show that T is both one-to-one and onto. 

(Onto) Consider arbitrary w € W. Define v = S(w), then 


T(v) = T(S(w)) = I(w) = w. 
Thus, there is a v € V so that T(v) = w. Therefore, T maps onto W. 


(One-to-one) Let u, v € V be vectors with T(u) = T(v). We will show that this implies u = v. 
Because §S is a function, we know that 


u=I(u) = S(T(u)) = S(T(v)) = T(v) = v. 
Thus, 7 is one-to-one. 
(<=) Now we show that if T is an isomorphism, then T is invertible. Suppose dim V = n. Since 
T is one-to-one, Theorems 4.7.19 and 4.7.38 give that if By = {v1, v2,..., vn} is a basis for V then 
T (By) = By isa basis of W. Then 
By = {w} = T(v}), w= T (v2), see, Wn = T(vn)}- 
By Theorem 4.2.26, we can define a linear transformation S : W — V by 


S(w1) = U1, S(w2) = v2, ree S(wn) = Un- 


Now, let u € V and ze W be vectors. We know that there are scalars, a@1,@2,...,Q@, and 
Bi, B2,.--, Bn so that 


U= Q1{Vip +202 +... + QyVy, and Zz = Bi w, + Pawn +... + Brun. 


We also know, by Theorem 4.2.21, that T o S and S o T are linear. We will show that (T o S)(z) = z 
and (S$ o T)(u) = u. We have, using linearity and the definitions of Bw and S, 
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(T o S)(z) = T(S(z)) 
= T(S(Biw1 + Bown +... + BnWn)) 
= BiT(S(w1)) + B2T (S(wn)) +. + Bn T (S(wn)) 
= BiT (v1) + BoT (v2) +... + BnT (Un) 
= Biwi + Bowo2 t+... + BrWn 


= Z. 
We also have 


(SoT)(u) = S(T(u)) 
= S(T(ayvy + a2v2 +... + QnUn)) 
= a1 S(T(v1)) + a2S(T(v2)) +... + On S(T (n)) 
= Bi S(w1) + BoS(w2) +... + Br S(wn) 
= Biv, + Bova +... + Ban 
=U. 


Therefore, S = T~! and so T is invertible. 


We have noticed that a linear transformation between isomorphic vector spaces may not be an 
isomorphism and therefore it will not be invertible. The next corollary helps complete this discussion 


by stating that invertible linear transformations only exist between vector spaces of the same dimension 
(between isomorphic vector spaces). 


Corollary 4.7.50 


Let V and W be finite-dimensional vector spaces and T : V — W linear. If T is invertible, then 
dim V = dim W. 


Proof. The result follows directly from Theorem 4.7.49 and Theorem 4.7.37. 


Example 4.7.51 The radiographic scenarios of Figures 4.24-4.26 feature transformations from the 
vector space of 2 x 2 images (dimension 4) to vector spaces of radiographic images (each of dimension 
4). Yet, none of the transformations is invertible. Each transformation is neither one-to-one nor onto. 


To complete the introduction to inverses, we present, in the next theorem, some useful properties. 


Theorem 4.7.52 

Let T : V — W bean invertible linear transformation. Then 
ary Na7, 

(b) T~! is linear, and 

(c) T~! is an isomorphism. 
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The proof of this theorem is the subject of Exercises 53, 54, and 55. 


4.7.10 Left Inverse Transformations 


In general, radiographic transformations are not transformations between vector spaces of equal 
dimension, so we cannot expect invertibility. If such a transformation were invertible, and if we could 
determine the details of the inverse transformation, then for any brain image x with radiograph b we 
have T(x) = b and more importantly x = T—!(b). 

However, invertibility is actually more than we require. We really only hope to recover a brain image 
from a given radiograph, one that is the output of a radiographic transformation? . That is, suppose V is 
the space of brains and W is the space of radiographs and that we have a radiographic transformation 
T : V > W and another transformation S: W — V such that S(T (x)) = x for all brains x € V. 
Then we recover the brain x by x = S(T (x)) = S(b) from the radiograph b = T(x). In this section, 
we discuss this “almost inverse” transformation. 


Definition 4.7.53 


Let V and W be vector spaces and T : V ~ W. Then S: W —> V, is called a left inverse of T if 
SoT=TIy. 


If a transformation has a left inverse then a domain object can be uniquely recovered from the 
codomain object to which it maps. 


Example 4.7.54 The integration transformation U from Example 4.7.46 has a left inverse transformation, 
the differentiation transformation T . That is, given a polynomial p € P (IR), we have, for p = ax +b 


(U oT)(p) = U(T(p)) 
= U(T(ax +b)) 
_ anes 
iy (5s cS bx) 
=ax+b 


So, T(U(p)) = p for all p € P,(R). 


The following theorem tells us one way in which we can identify a linear transformation that has a 
left inverse. 


Theorem 4.7.55 
Let T: V > W be a linear transformation. Then T is one-to-one if and only if T has a left 
inverse. 


5 Later, we will discuss outputs that are the result of a corrupted radiographic process. That is, a radiograph that, due to 
noise, is not in the range of the radiographic transformation. 
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The proof can be extracted from relevant parts of the proof of Theorem 4.7.49. See also Exercise 56. 


Example 4.7.56 Consider #/3(R), the vector space of histograms with 3 bins, and the transformation 
T : H3(R) > R* defined by 


hy +h2+hz 
hy —h 
T(H) = ho — I 
ha hig 


where H is a histogram (vector) and h1, 2,3 € R are the three ordered values assigned to the 
histogram bins. We are interested in whether or not T has a left inverse. If so, then any vector in R* 
(which is the result of the transformation of a histogram) uniquely defines that histogram. 

We can determine if such a left inverse transformation exists by determining whether or not T is 
one-to-one. Consider two vectors T(H) and T(J) given by 


hy +ho2+h3 Jit jt 
TH=| 7% and TIF) = | FO 
hz —h, 2-hl 
h3 — ho BB j2 


If T(H) = T(J), it is a matter of some algebra to show that indeed, hy = j1, ho = jo, and h3 = jz. 
That is, H = J. Thus, T is one-to-one and a left inverse transformation exists. 


Corollary 4.7.57 
Let T : V — W bea linear transformation. If T has a left inverse, then dim V < dim W. 


The proof follows directly from Theorems 4.7.22 and 4.7.55. 
Now before we complete this section we discuss the language introduced. 


xx Watch Your Language! We have discussed several topics in this section. It is important, when discussing 
these topics, to recognize what the terminology is describing: vector spaces, transformations, or vectors. Be 
careful not to apply, for example, a term describing transformations to vector spaces. Here are appropriate 
uses of the terminology in this section. 


T : V — Wmaps V onto W. 

T : V — Wisa one-to-one transformation. 

T : V > Wis anisomorphism. 

V is isomorphic to W. 

V and W are isomorphic vector spaces. 

T~!:W = Vis the inverse of T. 

lf T—! is the inverse of T and T(x) = y, we can solve for x by applying the transformation, T—! to 
both sides to get T~!(T(x)) = T~1(y) = x. 


NANA 


It is inappropriate to say the following. 


X T:V > Wmaps ve V ontow € W. 
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V and W are one-to-one to each other. 

T : V > Wis isomorphic. 

V and W are isomorphisms. 

V and W are equal/equivalent/the same. 

T~! is the reciprocal of T. 

If T—! is the inverse of T and T(x) = y, we can solve for x by dividing both sides by T to get 
x=T-'). 


X*< S< Se OK SK OK 


Inatypical radiographic scenario, we wish to obtain high-resolution images in our brain image space, 
say V. We also want to achieve this goal without requiring excessive radiographic data, taking as few 
images as possible for our radiographic image space, say W. That is we hope that dim W « dim V 
(that is, dim W is much less than dim V). Unfortunately, that also means that the transformation will 
not be one-to-one and we will not have a left inverse. This means that we require more linear algebra 
tools to accomplish our ultimate tomographic goals. 


4.7.11. Exercises 


1. Let T : R? — R? be defined by T(x) = Ax, where A is the following matrix: 


1 0 
(411) 
(a) Sketch the image of the unit square (vertices at (0,0), (1,0), (1, 1), and (0, 1)) under the 
transformation T. 
(b) Based on your sketch, determine whether T is one to one, and whether T is onto. Your 
answer should include a brief discussion of your reasoning, based on the geometry of the 
transformation. 


(c) Prove your conjecture about whether T is one-to-one using the definition of a one-to-one map. 
(d) Prove your conjecture about whether T is onto using the definition of an onto map. 


2. Let T : R? — R? be defined by T(x) = Ax, where A is the following matrix: 


1/2 -1/2 
(01) 
(a) Sketch the image of the unit square (vertices at (0,0), (1,0), (1, 1), and (0, 1)) under the 
transformation T. 
(b) Based on your sketch, determine whether 7 is one to one, and whether T is onto. Your 
answer should include a brief discussion of your reasoning, based on the geometry of the 
transformation. 


(c) Prove your conjecture about whether T is one-to-one using the definition of a one-to-one map. 
(d) Prove your conjecture about whether T is onto using the definition of an onto map. 


For the Exercises 3-11 (Recall the linear transformations from Section 4.2 Exercises | to 12.), determine 
whether the transformation is one-to-one and/or onto. Prove your answer. 


3. Let T : R* > R° be defined by T(x) = Ax, where A is the following matrix: 
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9. 


1-200 
—3 0 -11 
0 -6-11 


. Define f : R? + R? by f(v) = Mv + x, where 


121 1 
m=(134) and ace 


. Define F : V > P, where 


V = {ax* + Ga —2b)x +b| a,b € R} C Po. 


by 
F (ax? + Ga — 2b)x +b) = 2ax + 3a — 2b. 


. Define G : Pz > Mo x2 by 


2 - a a-—b 
G(ax torta= (7,272). 


. Define hh: V > P, where 


a bioec 
Paha.) 


a,b,c eR} CM 2x3 


by 
h G2. 8 =ax+ec 
Ob—c2a} . 
Let 
3a 
b 2a 
T= 0 a,b,cER 
c b 
3c 


And define f : Z > P2 by 
f(1) = ax? + (b+c0)x + (atc). 


Define f : M2x2 > R* by 


Sy 

oS 
a 8 
QS 
eee” 
| 
Noes 
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xz=Ocm xz=5cm 


Fig.4.31 Example of averaging heat state transformation. 


10. 


11. 


Define f : P2 > R? by 
flaxt+ oxo = (490), 


a—-c 


Let 714 be the set of all possible heat states sampled every 1 cm along a 5 cm long rod. Define a 
function T : 74 — 714 by replacing each value (which does not correspond to an endpoint) with 
the average of its neighbors. The endpoint values are kept at 0. An example of T is shown in 
Figure 4.31. 


For Exercises 12-23, determine whether a transformation described can be created. If so, create it 
and prove that the transformation you created satisfies the description. If not, prove that no such 
transformation exists. 


12. 
13. 
14. 
15. 
16. 
17. 
18. 
19. 
20. 
21. 
22. 


23. 


An onto transformation that maps from R? to R?. 

A one-to-one transformation that maps from R? to R>. 

An onto transformation that maps from R? to P. 

A one-to-one transformation that maps from R? to P}. 

A one-to-one transformation that maps R? to Mox 3. 

An onto transformation that maps R? to M 2x3. 

A one-to-one transformation that maps M23 to R?. 

An onto transformation that maps M23 to R?. 

An onto transformation that maps P; to V = {(x, y, z)|x +y+z= 0}. 

A one-to-one transformation that maps P| to V = {(x, y, z)|x + y+z =O}. 

A one-to-one transformation that maps D(Z2) (the vector space of 7-bar LCD images) to H7 (the 
vector space of heat states sampled 7 times along a rod). 

An onto transformation that maps D(Z2) (the vector space of 7-bar LCD images) to H7 (the vector 
space of heat states sampled 7 times along a rod). 


For Exercises 24-33 determine whether or not the transformation is an isomorphism, then determine 
whether it has an inverse. If not, determine whether it has a left inverse. Justify your conclusion. 


24. 


T : Vo > Vr, where Vo is the space of objects with 4 voxels and Vz is the space of radiographs 
with 4 pixels and 


278 4 Linear Transformations 


t+ 2X2 
Baal v3 
+4 
T 2 2 
tT | X4 321 + Uq 4 304 
1 ail 
321 + 2X3 + 3X4 


25. T : Vo — R*, where Vo is the space of objects with 4 voxels and 


Ty X3 Baal 2X3 


B 


where B is the standard basis for Vo. 


a 
: 4 ab = b 

26. T: Moyx2 > R defined by 7 (2) = ae 
c-—d 
a 

: 4 ab\ | b+1 

27. T: Max2 > R defined by 7 (4 ,) =I 5p 96 
d 


28. T : P2(R) > P2(R) defined by T (ax? + bx +c) =cx* +ax+b 

29. T : P2(IR) > P2(R) defined by T (ax? + bx +c) = (a+ b)x* —b+e. 

30. Letn € N and let T : P,,(R) > Py (R) be defined by T(p(x)) = p’(x). 

31. T : Hy(R) > R*‘ defined by T(v) = [v]y, where Y is the basis given in Example 3.4.15. 
32. T : D(Z2) > D(Z2) defined by T(x) = x+4+x. 

33. The transformation of Exercise 12 in Section 4.2 on heat states. 


In Exercises 34-38, determine whether each pair of vector spaces is isomorphic. If so, find an 
isomorphism between them. If not, justify. 


34. IR? and P;(R) 

35. R2 and Mox3 

36. Pe(R) and M2 x1 
37. D(Z2) and Z3 

38. D(Z2) and P2(Z2) 


Additional Questions 


39. Let V and W be finite-dimensional vector spaces. Suppose T : V —> W is linear and T(U) = S 
for some U C V and spanning set S = Span W. Prove or disprove that T maps V onto W. 

40. Show that the zero transformation Z : V > W, defined by Z(v) = 0 for all v € V, can be one- 
to-one. 

41. Give an example to show that the zero transformation can be onto. 

42. (The composition of one-to-one maps is one-to-one.) Let T : U — V and S$: V — W be one-to- 
one linear maps between vector spaces. Prove that the composition S o T : U — W is one-to-one. 

43. (The composition of onto maps is onto.) Let T : U — V and S: V — W be onto linear maps 
between vector spaces. Prove that the composition S o T : U + W is onto. 

44. Consider the vector space D(Z2) of 7-bar LCD characters and linear transformation T : D(Z2) > 
D(Z2) defined by T(x) = x + x. Determine whether T is one-to-one and/or onto. 


47 


45. 


46. 
47. 


48. 


49. 


50. 


51. 
52. 
53. 
54. 
55. 
56. 
Sis 
58. 


59. 
60. 


61. 
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Consider the transformation of Exercise (Fig. 4.31) on heat states. Determine if this transformation 
is one-to-one and/or onto. 

Prove that the converse of Lemma 4.7.30 is false. 

Consider the Radiography/Tomography application that we have been exploring. We want to be 
able to recover brain images from radiographs. Based on the discussion in this section, why is it 
important to know whether the radiographic transformation is one-to-one? Discuss what is needed 
in order for a radiographic setup to be a one-to-one transformation or discuss why this is not 
possible. 

Consider the Radiography/Tomography application that we have been exploring. We want to be 
able to recover brain images from radiographs. Based on the discussion in this section, why is it 
important to know whether the radiographic transformation is onto? Discuss what is needed in 
order for a radiographic setup to be an onto transformation or discuss why this is not possible. 
Prove or disprove the following claim. 

Claim: Suppose T : V — W is an onto linear transformation. Then T(V) = W. 

Prove or disprove the following claim. 

Claim: Suppose T : V — W isa linear transformation that maps V onto W and that B is a basis 
for V. Then Span T(6) = W. 

Prove or disprove the following claim. 

Claim: Suppose T : V — W is a one-to-one linear transformation. Then dim T(V) = dim V. 
Determine whether or not a transformation T : V — {0} can be an isomorphism. If so, state the 
conditions on which this can happen. If not, justify. 

Prove Theorem 4.7.52(a). 

Prove Theorem 4.7.52(b). 

Prove Theorem 4.7.52(c). 

Prove Theorem 4.7.55. 

Let V and W be finite-dimensional vector spaces. Prove that if dim V = dim W and T : V > W 
is an onto linear transformation, then T is an isomorphism. 

Let V and W be finite-dimensional vector spaces. Prove that if dim V = dim W andT : V > W 
is a one-to-one linear transformation, then T is an isomorphism. 

If the results in Exercises 57 and 58 are true, how would this help when creating an isomorphism? 
If the results in Exercises 57 and 58 are true, how would this help when determining whether a 
transformation is an isomorphism? 

Let S : V — Wbeaone-to-one linear transformation, T : V — W bean onto linear transformation. 
Suppose that 0: W — X and R: U — V are arbitrary linear transformations (you have no 
information whether R or Q are one-to-one or onto.) 


(a) Is So R always one-to-one? 
(b) Is Qo S always one-to-one? 
(c) Is T o R always onto? 
(d) Is Qo T always onto? 


® 


Check for | 
updates 


Invertibility 


Our exploration of radiographic transformations has led us to make some key correspondences in our 
quest to find the best possible brain image reconstructions. 


e Every one-to-one transformation has the property that two different radiographs do not correspond 
to the same brain image. 

e Every onto transformation has the property that every radiograph corresponds to at least one brain 
image. 

e Every one-to-one transformation has a left-inverse, which can be used to correctly reconstruct a 
brain image from any radiograph. 


We want to know whether we can invert the radiographic transformation (see Figure 5.1). 


Fig.5.1 Inthe radiography exploration, we ask, “Given this radiograph, what does the brain that produced it look like?” 


Now, we have also found that invertible transformations are both one-to-one and onto, but 
radiographic transformations are not expected to have these properties in any practical scenario. 

In this chapter, we explore the matrix representations of transformations and the corresponding 
vector spaces (domain and codomain). We need to understand the properties of these vector spaces 
(brain image space and radiograph space) in order to fully appreciate the complexity of our tomography 
goal. This study leads us to understand a variety of properties of invertible transformations and invertible 
matrices. In the final section, we will perform our first tomographic reconstructions using a left-inverse 
matrix representation of our radiographic transformation. 
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5.1 Transformation Spaces 


We have come a long way in understanding the radiography/tomography problem. We understand that 
brain images and radiographs can be viewed as vectors in their respective vector spaces with associated 
arithmetic. We have found that these vectors can be described by linear combinations of a smaller subset 
of vectors. We are able to efficiently define subspaces of vectors using spanning sets or bases. We have 
gained an understanding of the radiographic transformation that summarizes the physical process of 
imaging a brain to produce a radiograph. We found that the process, even though described by a linear 
operation with simple properties, was prone to having both invisible objects and radiographs that are 
not possible. And then our first attempts at determining which brain image produces a given radiograph 
have ended in some disappointment. In fact, for a transformation to be invertible, it must be between 
vector spaces of equal dimension—a condition that we do not expect in any practical situation. But 
then we noticed that all we really need is for our transformation to have a left inverse. Unfortunately, 
we found that only one-to-one transformations have left inverses—again a condition which we do not 
expect. Finally, even if our transformation is onto, this does not guarantee that we can recover the brain 
image that produced our radiograph. 

However, we also made the strong connection between transformations and their matrix representations. 
Any properties of the linear radiographic transformation T are mimicked in the properties of the 
corresponding matrix representation M = bay Is there information in M that can shed light on more 
useful properties of T? We are excited to offer the encouraging answer, “Yes, there is useful information 
in M to shed light on our problem!” 

In this section, we consider the radiographic transformation restricted to subspaces to obtain an 
invertible transformation. The remaining catch is that we still need these two subspaces to be sufficiently 
rich and descriptive for practical tomography. It will take us several more sections to explore this idea 
of practicality. 


5.1.1 The Nullspace 


We begin our exploration by considering objects invisible to the radiographic transformation. Any one- 
to-one transformation cannot have invisible objects (except the zero vector). That is, whatever domain 
subspace we choose cannot contain invisible objects. This approach may seem counterproductive, but 
right now, we need to more deeply understand our transformation. 

We saw that if two brain images produce the same radiograph, then the difference image is an 
invisible brain image. More to the point, from the radiograph itself, there is no way to determine which 
brain image is the best one—and there may be infinitely many to choose from. If the difference image 
contains information about a brain abnormality then our choice is critical from a medical standpoint. 
We will explore these invisible objects in a general vector space setting. 


Definition 5.1.1 


Let V and W be vector spaces. The nullspace of a linear transformation, T : V — W, written 
Null(T), is the subset of V that maps, through 7, to 0 € W. That is, 


Null(T) = {v € V| T(v) = 0}. 


The nullspace of a transformation contains all of the domain vectors that map to the zero vector in 
the codomain under the transformation. These vectors are invisible to the transformation. 


5.1 Transformation Spaces 


Example 5.1.2 Define T : R? > R? by 


This transformation is linear: if 


and a € R, then 


x r ax +r 
Tila|y]+]s =Tlay+t+s 
Zz t az+t 
See ean nes 
~ Vay +s) + (az +t) 
a) 
~ Vay + az) + (s +2) 


— (ax+az en r+t 
~ ay+az s+t 
= X+2Z 4 r+t 
yrz S+t 

r 


x 
=aT|y]+T{s 
z t 


We now examine the nullspace of 7: 


Null(T) = {v € R*® | T(v) = 0} 


Xx 
= y] |x+z=y+z=0,x,y,zER 


NX 


= y] |x=-z,y=-z,zeER 


= —z}|zeER 


= jal 1]flaeR 
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The name “nullspace” implies the set is actually a subspace. Indeed, we see that the nullspace in 
Example 5.1.2 above is a subspace of the domain R?. We claim this general result in the next theorem. 


Theorem 5.1.3 
Let V and W be vector spaces and T : V > W be linear. Then, Null(7) is a subspace of V. 


Proof. Suppose V and W are vector spaces and T : V — W is linear. We will show that Null(7) is 
not empty and that it is closed under vector addition and scalar multiplication. 

First, Null(7) contains 0 € V since T is linear and 7(O) = 0. Next, consider u, v € Null(T) 
and scalar a. We have T(au + v) = aT(u) + T(v) = a(0) + 0 = 0. So, au + v € Null(T) and by 
Corollary 2.5.6, Null(7) is closed under vector addition and scalar multiplication. Therefore, by 
Theorem 2.5.3 Null(7) is a subspace of V. 


Definition 5.1.4 


Let V and W be vector spaces and T : V — W linear. The nullity of 7, written Nullity(7), is equal 
to dim(Null(T)). 


The nullity of a transformation is just the dimension of the nullspace of the transformation. Most 
English nouns that end in “ity” indicate a quality or degree of something: capacity, purity, clarity. Nullity 
is no exception. In some sense, it measures the variety present in the nullspace of a transformation. The 
larger the nullity, the richer the set of null objects. For example, a radiographic transformation with a 
large nullity will have a greater variety of invisible objects. 


Example 5.1.5 Let V C P2 be defined by V = {ax + 3a —2b)x +b | a, b € R}. Define the linear 
transformation F : V > P, by 


F (ax? + (a — 2b)x + b) = 2ax + 3a — 2b. 
Let us now find the nullspace of F. Using the definition, we have 


Null(F) = {v € V | F(v) = 0} 
= {ax* + (3a — 2b)x +b | 2ax + 3a — 2b = O} 
= {ax? + Ga —2b)x +b|a=0,b =0} 
= {0}. 


In this case, Nullity(F) = 0. 


When the nullity of a transformation, 7, is zero, we say that the transformation has a trivial nullspace, 
or that the nullspace is the trivial subspace, {0}. 
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Example 5.1.6 Define h : V > Pj, where 
a bee 
= & b-c . 


h ab c\ _ * 
O0b-—c2a eae 


Transformation h is linear (which the reader can verify). By definition, 


a,b,c € R| Cc M2x3 


by 


Null(h) = {v € V | h(v) = 0} 


a boc 
Ne eo a.bceR| 


abioec 
ice ax+c=0; a.bceR| 


a=0,c=0,b€R| 
0b0 
ey oer 


-sm[ (39) 


In this case Nullity(/) = | because there is one element in the basis for the nullspace of h. 


Example 5.1.7 Consider the transformation from the vector space of six-bin histograms to the vector 
space of three-bin histograms, T : %6(R) > J3(R), defined as the operation of bin pair summation 
described as follows. Suppose J € %(R) and K € #(R). If K = T(J) then the first bin of K has 
value equal to the sum of the first two bins of J, the second bin of K has value equal to the sum of bins 
three and four of J, and the third bin of K has value equal to the sum of bins five and six of J. The 
nullspace of T is set of all J € CR), which map to the zero histogram in .73(R). Let the ordered bin 
values of J be {b1, b2, b3, ba, bs, be}. Then 
Null(T) = {J € Jo(R) | bi + by = bs + by = bs + bg = 0}. 

1F 1F - 1F 
Null(T) = Span 4 7-75 , 9 3 |a 2% 5 6 


1- 1- 1 


The nullity of T is 3 because a basis for Null(7') contains three vectors. 


Example 5.1.8 Consider the linear transformation T : D(Z2) — D(Z2), on the space of 7-bar LCD 
characters, defined by T(d) = d +d. As we have seen previously, T maps every input vector to the 
zero vector. So, Null(7) = D(Z2) and 


nullity(7) = dim Null(7) = dim D(Z») = 7. 


Any basis for D(Z2) is also a basis for Null(7). 
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Example 5.1.9 Consider the radiographic scenario of Figure 4.24. Recall, the transformation, T is 
described by 


x1 + x3 
x1 X3 
T = 
x2 + XxX4 
x2 x4 
Xy+x2  X3+X4 
In looking for a nonzero null vector, suppose x; = a # 0. Then we see that x2 = —a, x3 =a, and 
x4 = —a. Using similar arguments with all object voxels, we see that 


Null(T) = Span 


We see that Nullity(T) = 1. 


Example 5.1.10 Consider the linear transformation T : H4(R) — H4(R), on heat states of four 
values, defined as the averaging function of Exercise 12 of Section 4.2. T replaces each heat state 
value (which does not correspond to an endpoint) with the average of its neighbors. The endpoint 
values are kept at 0. An example of a heat state / and the result of this transformation T (/) is shown in 
Figure 5.2. The nullspace of T is defined as Null(T) = {h € Ha4(R) | T(h) = 0}. It is straightforward 
to show that h = 0 (the zero vector in 714(R)), is the only heat state which satisfies the criterion 
T(h) = 0. Thus, Null(7) = {0} and Nullity(7) = 0. 


T = 10 


T-0 Le 4 
x = 0cm x = 5cem 


Fig.5.2 Example of averaging heat state transformation. 
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5.1.2. Domain and Range Spaces 


When considering a transformation, we want to know which vectors can be applied to the transformation. 
In the case of a radiographic transformation, we wonder what is the shape and size of brain images that 
the particular radiographic transformation uses. As with most functions, this set is called the domain. 
In linear algebra, we consider only sets that are vector spaces. So, it is often referred to as the domain 
space. If T : V > W, we call V the domain space. There is also an ambient space to which all of 
the vectors in the domain space map. In the case of a radiographic transformation, this space contains 
images that satisfy the definition of a radiograph. We say that the codomain of a linear transformation, 
T : V > W, is the ambient vector space W to which domain vectors map. These definitions were first 
introduced in Section 4.2. 

In Examples 5.1.5 and 5.1.6, the codomain is P;. In Example 5.1.9, the codomain is the space 
of 4-value grayscale radiographs defined in Figure 4.24. We have seen that not all transformations 
map onto their codomain—not all codomain vectors b € W can be associated with a domain vector 
x € V such that T(x) = b. In the radiographic sense, this means that not all radiographs are possible. 
Typically in applications, the set of all possible codomain vectors is more useful than the codomain 
itself. 


Definition 5.1.11 


We say that the range space of a linear transformation T : V — W, written Ran(T), is the subset 
T(V) C W. That is, 
Ran(T) = {T(v) | v € V}. 


Notice that, by name, we imply that the range space is a vector space and, in fact, it is. 


Theorem 5.1.12 
Let V and W be vector spaces and let T : V ~ W be a linear transformation. Then Ran(7) is 
a subspace of W. 


The proof is the subject of Exercise 29. See also the proof of Theorem 5.1.3. 
As the dimension of the nullspace of a transformation was given a special name, nullity, the 
dimension of the range space is also given the special name, rank. 


Definition 5.1.13 


Let V and W be vector spaces and T : V — W linear. We define the rank of the transformation as 
the dimension of the range space and write 


Rank(T) = dim Ran(T). 
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In both Examples 5.1.5 and 5.1.6, the range is equal to the codomain, ?)(R). In both of these 
examples, the rank of the transformation is dim P| (R) = 2. Let us use the definition of range to verify 
this for Example 5.1.5. We will leave verification for Example 5.1.6 for the reader. Recall, the definition 
of F from Example 5.1.5 gives us that 


Ran(F) = {F(v) | v € Po(R)} 
= {2ax + 2a — 2b | a,b € R} 
= {2a(x + 1) — 2b) | a,b € R} 
= Span{x + 1, 1} 
= P2(R). 


In general, the range need not equal the codomain. Let’s consider several examples. 


Example 5.1.14 Define f : M2,2(R) > R* by 


Determine Ran(f) and Null(/). First we find the range. 


Ran(f) = {f(v) | v € M2 x2(R)} 


ab 
ae a,b,c. eR 
a 
= ore a,b,cER 
b 
c 
1 0 0 
1 1 0 
=a 0 +b 1 +c 0 a,b,cER 
0 0 1 
1 0 0 
1 1 0 
=Spany tol lif-lo 
0 0 1 
1 0 0 
; 1 1 0 ae : : 
Since ol lil-lo is a linearly independent spanning set, we have Rank(f) = 3. Now we 
0 0 1 


compute the nullspace. 
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Null(f) = {v € Mox2 | f(v) = 0} 


aC) abcd eRand ¢(24) =o 
ab 
cd 


a,b,c,d € Rand ? =0 


II 
—— 


Thus Nullity( f) = 1. 


In this example, the codomain is R*+ and Ran(f) 4 R* that means there are elements in R* that are 
not mapped to through f. That is, f is not onto. 
And, there is more than one element in the nullspace. In particular, notice that 


(01) =4 (03) = 


00 00 
(01) #(02) 
Thus, f is not one-to-one. 


Since the range of a transformation is the set of all vectors in the codomain that can be mapped to 
by some domain vector, we can express it in terms of a basis of the domain. This can be a very useful 
tool for finding a spanning set for the range. 


oooo 


Theorem 5.1.15 
Let V and W be finite-dimensional vector spaces and T : V + W be linear. Suppose B is a basis 
for V, then Span 7(6) = Ran(T). 


Proof. Let V and W be finite-dimensional vector spaces and let T : V — W bea linear transformation. 
Suppose B = {v1, v2,..., Un} is a basis for V and T (vz) = wz fork = 1,2,...,n. For any vector, 
u € V wecan write u = a,vy + Q2V2 +... + Qyvy, for some scalars a1, 2, ..., Ay. Then, 
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Ran(7T) = {T(x) |x € V} 
={T (qyvyy tagv2+...+ Qnvn) | ag € R,k = 1,2,...,n} 


= {ajwy +agu2+...+anwn | a, € R,k =1,2,...,n} 
= Span{w1, w2,..., Wn} 
= Span T (B). 


It is important to note that the spanning set, T(B), in the proof of Theorem 5.1.15 need not be 
linearly independent. 


Example 5.1.16 Consider the histogram transformation T of Example 5.1.7. The range of T is the 
space of all three-bin histograms, which can be obtained as bin-pair sums of six-bin histograms. We can 
use Theorem 5.1.15 to find a spanning set for Ran(7), then use the method of spanning set reduction 
(see page 143) to find a basis for Ran(7). 


is a basis for 76(R). Applying the transformation to each basis vector yields 


IP IP IP 


0 : ; 0 ; 0 


i 2 


The set T (B¢) is linearly dependent. We can extract a basis for (J)3(R) as a subset of T(B). The basis 
is 
1 1 1 
B= 4, Ks , . 
1 2 3 I 2 3 : I 2 3 


We have Ran(T) = Span T (B6) = Span B3 and Rank(T) = dim Ran(T) = 3. 


Example 5.1.17 Consider the 7-bar LCD transformation T of Example 5.1.8. The range of T is the 
space of all 7-bar LCD characters that are the result of adding a 7-bar LCD character to itself. In 
this case, only the zero vector satisfies this requirement. Thus, Ran(T) = {0}, Ran(T) = Span @, and 
Rank(T) = 0. 
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Example 5.1.18 Consider the heat state transformation T of Example 5.1.10. The range of T is the 
space of all 4-value heat states that are the result of the averaging operation T on a 4-value heat state. 
We can use Theorem 5.1.15 to find a spanning set for Ran(7T). 


1 
Bay 087 ; 
0 be . t + +] 
0 1 2 3 4 5 
1 
0.5 6 
Ott ¢ “t + +) 
0 2 3 4 5 
1 
0.5 + 
Ott + — + +] 
0 2 3 4 5 
1 
0.5 —t— 
Ol $ + = =a 
0 2 3 4 5 


is a basis for H4(R). Applying the averaging transformation to each basis vector yields 


1 
0.5 
oo on 
Ot t t ¢ aa 
T(B) = 0 1 2 3 4 5 , 
1 
0.5 
uo ®, @. 
0 Lhe oy ee i \ 
0 1 2 3 4 5 
1 
0.5 
@ a oe 
ols 4 Hil at ng | 
0 1 2 3 4 5 
1 
0.5 
o@., 
Or ad t t 4 
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The reader can verify that T(G) is linearly independent and therefore is also a basis for 714(IR). We 
have Ran(T) = Span T (8) = H4(R) and Rank(T) = dim Ran(T) = 4. 


Example 5.1.19 Consider the radiographic transformation T of Example 5.1.9. 


is a basis for Z2,.2 IR). Applying the radiographic transformation to each basis vector yields a spanning 
set for the range of T. We can write 


T(B) = 


Ta “su “a “aear 


This set is linearly dependent (the sum of the first and last vectors equals the sum of the second and 
third vectors). A basis for Span T (GB) can be found as a subset of T (8). For example, 


1 
9b cT(B) 


is a linearly independent set with Span T(B) = SpanC. Thus, Rank(T) = 3. 


5.1.3. One-to-One and Onto Revisited 


We can continue this discussion again from the point of view of radiography. We saw that some 
transformations are not one-to-one (two different objects have the same radiograph). Also, we found that 
if two objects produce the same radiograph, that their difference would then be invisible. Another way to 
say this is that the difference is in the nullspace of the radiographic transformation. Since the nullspace is 
a vector space, if there is an object that is invisible to the radiographic transformation, any scalar multiple 
of it will also be invisible. It is also noteworthy that if a nonzero object is invisible (meaning both the 
zero object and another object both produce the zero radiograph) then the radiographic transformation 
is not one-to-one. 

Recall that, for a given radiographic transformation, we found radiographs that could not be produced 
from any object. This means that there is a radiograph in the codomain that is not mapped to from the 
domain. These radiographic transformations are not onto. 

We now state the theorems that generalize these results. Our first result gives a statement equivalent 
to being onto. 
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Theorem 5.1.20 


Let V and W be vector spaces and let T : V — W be a linear transformation. T is onto if and 
only if Ran(7) = W. 


Proof. Let V and W be vector spaces and let T : V > W be linear. Suppose Ran(T) = W then, by 
definition of Ran(T), if w € W, there is av € V so that f(v) = w. Thus T is onto. Now, if T is onto, 
then for all w € W thereisav € V sothat T(v) = w. That means that W C Ran(7). But, by definition 
of T and Ran(7), we already know that Ran(T) C W. Thus, Ran(T) = W. 


Next, we give an equivalent statement for one-to-one transformations. 


Theorem 5.1.21 


Let V and W be vector spaces. A linear transformation, T : V > W, is one-to-one if and only 
if Null(7) = {0}. 


Proof. Let V and W be vector spaces and let T : V > W be a linear transformation. Suppose T is 
one-to-one and suppose that uv € Null(7). Then T(u) = 0. But, T(0) = 0. So, since T is one-to-one, 
we know that u = 0. Thus, Null(7) = {0}. Now, suppose Null(7) = {0}. We want to show that T is 
one-to-one. Notice that if u, v € V satisfy 


T(u) = T(v) 


then 
T(u) — T(v) = 0. 


But since T is linear this gives us that 


T(u—v)=0. 


Thus, wu — v € Null(T). But Null(T) = {0}. Thus, u — v = 0. That is, u = v. So, T is one-to-one. 


Corollary 5.1.22 
A linear transformation, T : V — Ran(7) is an isomorphism if and only if Null(7) = {0}. 


Proof. First, suppose T is an isomorphism. Then, by Theorem 5.1.21, Null(7’) = {0}. Next, suppose 
Null(7) = {0}. Then, by Theorem 5.1.21, T is one-to-one. And by Theorem 5.1.20, T is onto. Thus, 
T is an isomorphism. 


294 5 Invertibility 


Theorems 5.1.20 and 5.1.21 and Corollary 5.1.22 give us tools to determine whether a transformation 
is one-to-one and/or onto. Let’s consider several examples. 


Example 5.1.23 Consider again, the example with V = {ax* + (3a — 2b)x + b|a,b € R} C Py and 
Define F : V — P}, defined by 


F (ax? + (a — 2b)x + b) = 2ax + 3a — 2b. 
We showed in Example 5.1.5 that Null(F) = {0}. Thus F is one-to-one. We also saw that Ran(F) = P}. 


Thus, the range and codomain of F are the same. And, so we know F is onto. But now we know that 
F is an isomorphism. This means that V = P . And, dim V = 2, Nullity(7) = 0, and Rank(F) = 2. 


Example 5.1.24 Define h : V > P , where 


a bie 
Papen 


a,b,c é€ R| Cc M243 


by 


We found that 
010 
Null(/) = Span (( 1 5) : 


Thus, / is not one-to-one. But, we also noted (again, be sure you know how to show this) that Ran(h) = 
P|. Thus, h is onto. And, dim V = 3, Nullity(h) = 1, and Rank(h) = 2. 


Example 5.1.25 Define g : V > R*, where V = P| by 
g(ax+b)= b 
a+b 


Notice that 


Null(g) = {ax +b|a,b eR, g(ax +b) = 0} 


a 0 
= 4ax+b|\a,beR, b =|0 
a+b 0 


fax +b|a=0,b=0} 
= {0}. 


Thus, g is one-to-one. Now we find the range space. 
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Ran(g) = {g(ax + b) | a,b € R} 
a 
= b a,beR 
a+b 
1 0 
= Span oO}, 1 
1 1 


Since Rank(g)=2 and dim R3=3, RA Ran(g) and thus g is not onto. And, dim V = 2, Nullity(g) = 0, 
and Rank(g) = 2. 


Example 5.1.26 Consider the histogram rebinning transformation of Examples 5.1.7 and 5.1.16. We 
found that Nullity(T) = 3, Ran(T) = H3(R), and Rank(T) = 3. Thus, T is not one-to-one but T is 
onto. 


Example 5.1.27 Consider the LCD character transformation of Examples 5.1.8 and 5.1.17. We found 
that Nullity(7) = 7, Ran(T) = {0}, and Rank(T) = 0. Thus, T is neither one-to-one nor onto. 


Example 5.1.28 Consider the Heat State averaging transformation of Examples 5.1.10 and 5.1.18. 
We found that Nullity(7) = 0, Ran(T) = H4(R), and Rank(T) = 4. Thus, T is an isomorphism. 


Example 5.1.29 Consider the radiographic transformation of Examples 5.1.9 and 5.1.19. We found 
that Nullity(7) = 1. Also, Rank(T) = 3 while the dimension of the radiograph space is 4. Thus, T is 
neither one-to-one nor onto. 


5.1.4 The Rank-Nullity Theorem 


In each of the last examples of the previous section, you will notice a simple relationship: the dimension 
of the nullspace and the dimension of the range space add up to the dimension of the domain. This is 
not a coincidence. In fact, it makes sense if we begin putting our theorems together. 


Theorem 5.1.30 (Rank-Nullity) 
Let V and W be finite-dimensional vector spaces and let T : V — W bea linear transformation. 
Then 

dim V = Rank(T) + Nullity(7). 


Proof. Let V and W be finite-dimensional vector spaces and let T : V > W be linear. Let B = 
{v1, U2,..., Un} be a basis for V. We will consider the case when Ran(7) contains only the zero vector 
and the case when Null(7) contains only the zero vector separately. This is to remove doubt that may 
arise when considering empty bases for these subspaces. 

First, we consider the case when Ran(T) = {0}. Then, a basis for Ran(7) is the empty set, so 
rank(7) = 0. We also know that if v € V then T(v) = 0. So, T(B) = {0}. Thus, 6 C Null(T) is a 
basis for the nullspace of T and Nullity(7) = n. Thus, rank(7) + Nullity(T) =n = dim V. 
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Next, we consider the case when Null(7) = {0}. In this case, the basis for Null(7) is the empty 
set and nullity(7) = 0. Now, we refer to Theorems 4.7.19, 5.1.15, and 5.1.21. We then know that 
{T (v1), T(v2),..., T(vn)} is linearly independent and we also know that Span{T (v1), T(v2),..., 
T (vn)} = Ran(T). Thus, {7 (v1), T(v2), ..., T(vn)} is a basis for Ran(7) and Rank(T) = n. Thus, 
Rank(7)+Nullity(7) = n. 

Finally, we consider the case where Rank(7) = m > 1 and Nullity(T) = k > 1. Let 


By = {0}, v2,..., Ux} 


be a basis for Null(7). Since dim V = n > k (see Corollary 3.4.22), we can add n — k vectors to By 
that maintain linear independence and therefore form a basis for V. That is, we can create the set 


B= {v1, 02,..., Ue, Vet, VkL2,---, Un} 
so that B is a basis for V. Then 
T (B) = {T (By), T ie4t), T Oe42), ---, TGn)} = (0, T x41), T Ox42),---, Tn). 


Let S = {T (0x41), T (Ug42), -.., T (Un)}. Notice that S has n —k elements in it and that Span S = 
Span T(B). We will now show that S is a basis for Ran(T). In doing this, we will have shown that 
n —k =rank(T), that is,n —k =m. 

By Theorem 5.1.15, Ran(T) = Span T(B) = Span S. From Lemma 3.4.23, we know that a basis 
for Ran(7) is a minimal spanning set for Ran(7). Son —k > rank(T) = m. 

Now, we show that S is linearly independent. Let aj41, ag+2,..., @, be scalars so that 


Oe T R41) + ongoT (UpR42) +... + AnT (Un) = 0. 


Then, using linearity of T, we have 


T (O41 VELL + OK420K42 +... + Antn) = 0. 


So, we see that 441 0p41 + Og+20¢42 +... + Qn Uy isin the nullspace of T. But, Null(7) = Span By. 
This means that we can describe a4) 0p41 + Og420K42 +... + QyUp USing a linear combination of 
the basis elements of Null(7). That is, 


ORL VEEL + OR-2UE42 +... + Andn = BV) + Bov2 +... + Bede 


for some scalars (3), J2,..., Gg. Rearranging this equation gives us the following linear dependence 
relation, for the vectors in B, 


By 0 + Bod2 +... + Bede — Op 1 UeLT — OR42VK42 — «+. — An Vy = O. 


Since B is a basis for V, we know that the above equation is true only when 


Bi = Pa =... = Be = On = Op. =... = An = 0. 


So, by definition, S is linearly independent. From Corollary 3.4.22, we know that a basis of Ran(T) 
is the largest linearly independent set in Ran(T). That means that n — k < rank(T) = m. Putting our 
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two results together, we find that S is a basis for Ran(T) and n — k = m. Rearranging this equation 
gives us our result: rank(T) + nullity(7) = dim V. 


For a geometric visualization of Theorem 5.1.30, we encourage the reader to recall Figure 3.1 and 
then work through Exercise 30. The proof of the Rank-Nullity Theorem shows us that we can create 
a basis for V that can be separated into a basis for the nullspace and a set that maps to a basis for the 
range space. 


Corollary 5.1.31 

Let V and W be finite-dimensional vector spaces and let T : V — W bea linear transformation. 
Let By = {v}, v2,..., vg} be a basis for Null(7). If B = {v1, v2,..., vp} is a basis for V, then 
{T (vg-41), T (vg+2),---, T (vn)} is a basis for Ran(T). 


The proof follows directly from the proof of Theorem 5.1.30. Theorem 5.1.30 is useful in 
determining rank and nullity, along with proving results about subspaces. We are often able to determine 
some properties of a transformation without knowing any more than the dimensions of the vector spaces 
from which and to which it maps. Let’s look at an example. 


Example 5.1.32 Given a linear transformation T : M2,5(R) — P4(R). We know that T cannot 
be one-to-one. The Rank-Nullity Theorem says that dim M2,.5(R) = rank(T) + nullity(7). Since 
rank(T) < dim P4 = 5 and dim M25 = 10, we know that nullity(7) > 5. That is, Null(T) 4 {0}. 
So by Theorem 5.1.21, we know that T cannot be one-to-one. 


Again, we have introduced more terminology and therefore, we need to discuss how to use this 
language properly. 


** Watch Your Language! The terminology in this section needs care because some terms are names of 
vector spaces while others are the dimension of these spaces. 
Appropriate use of terms from this section are as follows. 


v The nullspace of T has basis {vj, v2, ..., Un}. 
¥ Thenullity of 7 isn. 

v The range space has basis {w,, w2,..., Wm}. 
av The rank of T is m. 


Inappropriate use of terms from this section are as follows. 


X The nullity of T has basis {v1, v2, ..., Un}- 
X The nullity of T has dimension n. 

X Therank of T has dimension m. 

X Therank of the basis B is m. 
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5.1.5 Exercises 


5 


Invertibility 


For Exercises 1-16, state the (a) the domain, V, (b) dim(V), (c) the codomain, W, (d) dim(W), (e) 
Range space, (f) rank, (g) nullspace, (h) nullity, of each of the following transformations. And, then 
(i) verify the Rank-Nullity Theorem. 


1. 


2. 


ie y 
P:R? RS, where 7 (7) = x 
x—y 
x+y 
T :R? > R’, where T | y - 
0 


. Define F : V > Py, where 


V = {ax* + Ga —2b)x +b| a,b ER} C Po. 


by 
F (ax? + Ba — 2b)x + b) = 2ax + 3a — 2b. 


. Define G : Pz > M2 x2 by 


2 _ a a-—b 
Glax torta=(. 4) oP). 


. Define h: V > P}, where 


by 
h Ge if =ax+ec 
Ob—c2a} ' 
Let 
3a 
b 2a 
T= 0 a,b,cER 
c b 
RYa 


And define f : Z > P2 by 


f(D) =ax*+(b+e)x+ (ato). 


5.1 


10. 


11. 
12. 
13. 


14. 
15. 
16. 
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. Define f : M2x2 > R* by 


Sy 

a ™ 
9 8 

QS 
Wa 
Naess 


. Define f : P2 > R?* by 


flaxt+bx-o= (290), 


Cc 


. T: Vo > Vr, where Vo is the space of objects with 4 voxels and Vz is the space of radiographs 


with 4 pixels and 


Xy+ 2% 
XY x3 
: t3+ U4 
T = 51 
a) U4 371 + £2 + 304 
if di 
301 + 23+ gu 


T : Vo — R*, where Vo is the space of objects with 4 voxels and 


X] X3 x] X3 


X2 X4 x2 X4 


where B is the standard basis for Vo. 

T : P2(R) > P2(R) defined by T (ax? + bx +c) =cx* +ax+b 

T : Po(IR) > P2(R) defined by T (ax? + bx +c) = (a+ b)x* —b+e. 

T : P(R) > P(R) defined by T(p(x)) = p’(x), where P(R) is the space of polynomials with 
real coefficients. (Note: This is an infinite dimensional problem and therefore more challenging.) 
T : H4(R) — R* defined by T(v) = [v]y, where Y is the basis given in Example 3.4.15. 

T : D(Z2) > D(Z2) defined by T(x) =x+x. 

The transformation of Exercise 12 in Section 4.2 on heat states. 


For each vector space, V, in Exercises 17-24, create an onto transformation T : V > IR? where 


d is given also. If such a transformation is not possible, justify why not. Determine the range, rank, 
nullity, and nullspace of the transformation you created. 


17. 
18. 
19. 
20. 
21. 


V=R,d=3 
V=R,d=7 

V = P2(R),d =6 

V = P2(R),d =3 

V = M2 3(R),d=4 


300 
22. V = Aa 
23. V = Aa 


R),d=6 
R),d=4 
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24. V is the vector space of histograms with 7 bins. d = 3. 


Consider the three radiographic scenarios in Exercises 25—27, which were previously examined in 
Section 4.1. For each, state the nullspace and range of the radiographic transformation. And discuss 
how this information helps us understand the transformation? (Figures 5.3, 5.4 and 5.5). 


Uy X3 
v2 v4 
t 
by ) 


b, 4 


b3 


25. 


Height and width of image in voxels: n = 2 (Total 
voxels N = 4) 


Pixels per view in radiograph: m = 2 
ScaleFac = 1 
Number of views: a = 2 


Angle of the views: 0, = 0°, 62 = 90° 


Fig. 5.3 Tomographic Scenario A. Objects are in the vector space of 2 x 2 grayscale images. Radiographs are in the 
vector space of 2 views each with 2 pixels and the geometry as shown. 


Height and width of image in voxels: n = 2 (Total 
voxels N = 4) 


Pixels per view in radiograph: m = 2 
ScaleFac = V2 
Number of views: a = 2 


Angle of the views: 0; = 45°, 02 = 135° 


Fig. 5.4 Tomographic Scenario B. Objects are in the vector space of 2 x 2 grayscale images. Radiographs are in the 
vector space of 2 views each with 2 pixels and the geometry as shown. 
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e Height and width of image in voxels: n = 2 (Total 
voxels N = 4) 


e Pixels per view in radiograph: m = 4 
e ScaleFac = 2/2 
e Number of views: a = 1 


e Angle of the views: 6, = 45° 


Fig. 5.5 Tomographic Scenario C. Objects are in the vector space of 2 x 2 grayscale images. Radiographs are in the 
vector space of | view with 4 pixels and the geometry as shown. 


Additional Exercises 


28. Create a one-to-one transformation that is not onto. Describe the rank, range, nullspace, and nullity 
of the transformation you create. If no such transformation exists, justify. 

29. Prove Theorem 5.1.12. 

30. We now reconsider Example 5.1.2. 


(a) Find the matrix associated with this linear transformation. 

(b) Find the nullspace of this matrix and verify that it is the same as the nullspace of the 
transformation T. 

(c) Recall the superposition principle (Theorem 3.1.30). What does this theorem say about the 
preimage of the point (x, y) under the transformation T? (The preimage of the point (0, 0) 
under the transformation T is exactly the nullspace of 7.) 

(d) Every point in the range Ran(7’) = R? is associated with one of these parallel preimages; and 
each preimage has dimension 1. The set of these parallel preimages “fills out” the domain R?. 
This gives us a nice graphical way to visualize the Rank-Nullity theorem! 


5.2 Matrix Spaces and the Invertible Matrix Theorem 


As we begin this section, we should revisit our tomographic goal. Suppose we are given radiographs like 
those on top of Figure 5.6. We know that there is a linear transformation, T : O > R, that transforms 
the brain objects in O (whose slices can be seen on bottom in Figure 5.6) to these radiographs (vectors 
in #). Our overall goal is to recover these brain slices. What we know is that, if x is radiographed 
to get the images, b on top in Figure 5.6, then T(x) = b. Our hope is to recover x. Algebraically, 
this means we want x = T~!(b). We recognized, in Section 5.1, that not all transformations have 
an inverse. That is, 7~! may not even exist. We discussed various properties of a transformation for 
which an inverse exists. In fact, we have seen several small examples where we have seen a radiographic 
transformation without an inverse. In these examples, we saw that there are objects that are invisible 
to the transformation (nullspace vectors) and so the transformation is not one-to-one. 

Working with transformations on vector spaces that are less standard (like the vector space of 
brain images or the vector space of radiographs) can be tedious and sometimes difficult. We saw in 
Sections 3.5 and 4.7 that these n-dimensional spaces are isomorphic (act just like) the Euclidean spaces 
IR”. In fact, the isomorphism is the transformation that transforms a vector v to its coordinate vector 
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Fig.5.6 Top: Three examples of radiographs of a human head taken at different orientations. Bottom: Four examples 
of density maps of slices of a human head. 


[v]g (for some basis B of our vector space). With this we recognized, in Section 4.4, that there is a 
matrix representation, M = [ee of T : V > W that allows all computations of T to happen in the 
coordinate spaces instead of directly with the vectors of some obscure or less standard vector space. 
In this section, we will consider the invertibility of 7, but this time we will consider the properties of 
M that help us recognize when T is invertible or when T has a left inverse. We will discuss spaces 
related to the matrix M that are analogous to the nullspace and range space of the transformation 7. 


5.2.1 Matrix Spaces 


In Section 4.4, we considered an m x n matrix, M, where left matrix multiplication by M is a linear 
transformation that maps a vector from IR” to R”. Moreover, every linear transformation, that maps 
an n-dimensional space V to an m-dimensional space W, can be represented by left multiplication 
by some matrix. We next explore how the transformation spaces (nullspace and range space) are 
manifested in these matrix representations. Throughout this discussion, we will continue to use the 
notation M = mie To make the reading clearer, we suggest that whenever you see this notation, 
you read it with its meaning not its symbols. That is, read it as, “M is the matrix representation of T 
with respect to the bases By and By.” At times, we will write this as well. 


The Nullspace of a Matrix 

Recall that the nullspace of a transformation is the space of all vectors that are invisible to the 
transformation. Let V and W be vector spaces whose dimensions are 1 and m, respectively. To 
find the nullspace, Null(7), of a transformation T : V — W, we look for all vectors v € V so that 
T(v) = 0. Suppose that M is the matrix representation of T with respect to the bases By and By, of 
V and W, respectively. Then, in the coordinate spaces, we consider the transformation defined by left 
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multiplication by M. We will, for ease of notation, call this transformation Ty. Then, for v € Null(T), 
we should have a corresponding [v],, so that 


Tu ((lpy) = (71g" Play = MIvla, = (lay. 


This result suggests that the matrix representation of a transformation also has a nullspace, a subspace of 
coordinate (domain) vectors that result in the zero (codomain) coordinate vector upon left multiplication 
by M. 

Definition 5.2.1 


Let M € Minxn(R). The nullspace of M, denoted Null(), is 


Null(M) = {x « R” | Mx =0eR"}. 


The nullity of M is dim Null(). 


Definition 5.2.1 does not require a discussion about another transformation 7. This allows us to 
extend the idea of the nullspace of a transformation to matrices in general, without referring to the 
associated linear transformation. 

If V and W are n- and m- dimensional vector spaces, respectively and if T : V — W has matrix 
representation M = Tip" then the nullspace of T is a subspace of V, while the nullspace of M is a 
subspace of IR”. We now consider the relationship between these two vector spaces. In the following 
theorem, we show that a basis for the nullspace of a linear transformation can easily be used to find 
the nullspace for any matrix representation of that transformation. 


Theorem 5.2.2 

Let V and W be finite-dimensional vector spaces, T : V > W be a linear transformation, 
and By and By be bases for V and W, respectively. Also, let M = tale be a matrix 
representation of T. Suppose @ = {(1, G2, ..., Bx} is a basis for Null(7), possibly empty. Then 


= {Ailey [2ley,---» U9x]e,} is a basis for Null(M). 


Proof. Suppose V and W are finite-dimensional vector spaces, T : V > W isa linear transformation, 
and that By and By are bases for V and W, respectively. Also, suppose that M = bas Let 6 = 


{(G1, G2,..-, Be} © V bea basis for Null(7). If dim V = n, then the corresponding coordinate space 
is IR”. Now consider the set 


“w= {[Giley. [2le,.---.[4eley} SR". 


If 6 = @, then up = Y and T(0) = 0 € W. This means that M[0]8, = [0]g,,. Thus, Null(/) = {0} = 
Span (1). Now, suppose that G 4 %. We will show that yp is linearly independent and that Span p = 
Null(M). 

First, let a1, a2,..., ag be scalars that satisfy the linear dependence relation 


ay[Pile, + o2lGalp, +...+ ox[Fley, = 0. (5.1) 
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In Section 3.5, we determined that Equation 5.1 can be rewritten as 

lai) +028. +...+ axPrley = [0]p,- 
Because coordinate vectors are unique and because (3 is a basis, we know that 


ay, +0202 +...+ ag, = 0. 


Because (3 is linearly independent, aj = a2 =... = ax = 0. Thus, p is linearly independent. 
Next, we will use a set argument to show that Span uw = Null(M/). 


(C) Let x € Span p be the vector given by 


x =ay[B By + o2[F2]p, +... + axl Fc]p,- 


Since x € IR", there is a unique v € V so that x = [v]p,. 


Mx = [vies 


= [T]g" (ailBilay + o2lPlay +--+ oxlBeley) 

= ai[T 1g" (Silay + o2lT Ig" [Boley +... + oxfT Ig" [GelBy 
=ai[T Alby + o2[T PolBy +... + oxlT Pl By 

= a 0gy + a20By +...+ a0By 

= 02,- 


So, x € Null(M) and Span pz © Null(M). 
(2) Finally, we show that Null(M) C Spanp. Let [x]g, ¢ Null(M). Then 0= M[x]p, = 


V 
[Tie [x]b, = [Tx]py. So, x € Null(7) and can be written x = a1, + a2f2 +... + aK. 
Now, [x]sy = ai[Filey + o2lGelp, +.-.+ ax[Gxley. Thus, [x]p, € Span pz and Null(M) C 


Span pu. 
Therefore, Null(M) = Span wy. Thus, pu is a basis for Null(). 


Corollary 5.2.3 

Let V and W be finite-dimensional vector spaces, T : V + W be a linear transformation, and 
By and By be bases for V and W, respectively. Also let M = rie The transformation U : 
Null(T) + Null(M) defined by U(v) = [v]g,, is an isomorphism. Furthermore, Nullity(T) = 
Nullity(/). 


The proof of this corollary is the subject of Exercise 9. The main message here is that the nullspace 
of a matrix M is isomorphic to the nullspace of the corresponding transformation, T. 
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Example 5.2.4 Suppose M : R” — R” is the matrix representation of a linear radiographic transfor- 
mation T : V — W, relative to some bases. Null(M) is the set of all invisible objects as represented 
in the coordinate space RR", and Null(7) is the set of all invisible objects in V, which are grayscale 
images. And, Corollary 5.2.3 tells us that there’s a one-to-one and onto transformation that maps from 
Null(7) to Null(M). 


Example 5.2.5 Given the matrix 
111 
M=j]2 1-1 
—-10 2 


We can find Null(M/) by solving the matrix equation 


111 x 0 
21-1 y|=1{0 
-10 2 Zz 0 
The solution set is 
2Z 
—3z] eR? |zeR 
Zz 
So, the nullspace of M is given by 
2 
Null(M) = Span 3 
1 


The above example showed no connection to any transformation. We can talk about the nullspace 
of a matrix without relating it to a transformation. 


Example 5.2.6 Let V = {ax + bx* — ax +cla,b,c € R}and W = M),,3(R). Now, let us consider 
the transformation T : V > W defined by 


3 2 _ a aa 
T (ax” + bx at0=(,0, 4,4): 
Determine the matrix representation, M, of T and then find Null(M). First, we choose a basis for V. 
V= fax? + bx* —ax+e |a,b,c ERS = Span{x? —x,x?, 1}, 


and {x3 — x, x?, 1} is linearly independent. Therefore, a basis for V is given by 


By = {x? —x,x?, 1}. 
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Next, we find where each basis element maps to. We will write each result as a coordinate vector using 
the standard basis for W. Using the definition of T, we get 


[T(x3 —X)|By = Ie = 0) | = 
By 


| 
an 


, and 


00 0 
10-1) |p, 


ole =[(Jo5) |. = 
Bw 


: loroooco 
— 


Recall, these results form the columns of the matrix representation. So, 


1 00 
1 00 
1 00 
a 1 10 
-1 00 
0 —-10 
To find Null(M), we solve the matrix equation 
1 00 0 
1 00 0 
1 00 ie 0 
i 1-9)" 16 
-1 00 0 
0 -10 0 
The solution set is 
0 


lv]by = 0} |zeR 
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That is, 
0 


Null(M) = Span 0 
1 


In the last two examples, we did not show the steps to solving the matrix equations. As a review of 
past materials, we encourage the reader to check these steps. 


The Column Space of a Matrix 

We just saw how there is a close relationship between the nullspace of a transformation and the nullspace 
of the corresponding matrix representation. Similarly, we will see here how the range space of a linear 
transformation is related to the column space of the corresponding matrix representation. But first, let 
us define the column space of a matrix. 


Definition 5.2.7 


Let M be anm x n matrix with columns a1, a2,..., @, € R”. The column space of M, denoted 
Col(M) is defined to be 
Col(M) = Span{a1, a2,..., An}. 


The rank of M is dim(Col(M)). 


The columns of M are not necessarily a basis for Col(M) (See Exercise 8). But, we know that, 
because Col(M) is a subspace of R”, Rank(M) = dim(Col(M)) < m. Now, since the range space of 
T : V — W isa subspace of W, Rank(T) < m also. The next two theorems address the relationship 
between the range of T and Col(M). 


Theorem 5.2.8 
Let V and W be vector spaces, T : V — W be linear, and M = alee where By and By are 
ordered bases for V and W, respectively. Then, 


Col(M) = {[T(x)]By |x € V}. 


The proof is the subject of Exercise 10. 

Recall that Ran(T) = {T (x) | x € V}. So, Theorem 5.2.8 says that the column space of M is the 
set of all coordinate vectors of the range space of 7, relative to the basis By. The next theorem makes 
a connection between the bases of Ran(7) and Col(M). 
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Theorem 5.2.9 
Let V and W be finite-dimensional vector spaces, T : V + W be a linear transformation, and 
M= Whey where By and By are ordered bases for V and W, respectively. Suppose @ = 


{81, G2,..-, Gm} is a basis for Ran(T). Then uv = {[F1]6y.(F2lBy.---. (Gm) By} is a basis for 
Col(M). 


The proof is similar to the proof of Theorem 5.2.2 and is the subject of Exercise 11. 


Corollary 5.2.10 

Let V and W be finite-dimensional vector spaces, T : V — W be a linear transformation, 
and M= ire” where By and By are ordered bases for V and W, respectively. Then, 
the transformation U : Ran(T) — Col(M) defined by U(v) =[v]g, is an isomorphism. 
Furthermore, Rank(7) = Rank(M). 


The proof is the subject of Exercise 12. The main message here is that the column space, Col(M), 
of a matrix M is isomorphic to the range space of the corresponding transformation, T. 


Example 5.2.11 Suppose M:R” — R” is the matrix representation of a linear radiographic 
transformation T : V — W, relative to some bases. Col(M) is the set of all possible radiographs 
as represented in the coordinate space R™ . Ran(T) is the set of all possible radiographs in W, which 
are grayscale images. Corollary 5.2.10 says that there is a one-to-one and onto transformation that 
maps Ran(T) to Col(M). 


Next, we consider a method for finding a basis for Col(/). The following method utilizes matrix 
reduction to perform the method of spanning set reduction (see page 143). We want to find all w € R” 
so that there exists a v € R” such that Mv = w. 


Example 5.2.12 Let us consider the matrix 


111 
M=]{2 1-1 
-10 2 
To find Col(M), we find 
1 1 1 
w € Span 2),71)],]7-1 
-1 2 
That is, 
1 1 1 
w=x +y}l]t+zy{-l], 
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for some x, y, z € R. This is equivalent to the matrix equation 


111 x 
21-1 y)=w. 
-10 2 z 


So, we really are looking for all elements of the set 
x a 
{we R?|Mv=wforveR*}=2/5b] | | 2 1-1] | y] = |b]. wherex,y,zeR 
z c 


Next, we formulate the corresponding augmented matrix and examine the nature of the solution set. 


(Here, the right-handside of the augmented matrix is important to track through the matrix reduction, 
so we show these steps.) 


11 1a 
21-1) 
—-10 2Ic 
eas 11 a 
eg"? | 9-1 -3]-2a +b 
Monts No 1 3| ate 
10-2) -a+b 


Ry=r2+r1, R2=—ry 
— 


O01 3 2a —b 
00 O|-a+b+c 


R3=N2+73 


The last row of the matrix corresponds to the equation 


O0=-a+b-ec. 
a 
Therefore, as long as w= |b] with -a+b+c=0 the equation Mv = w has a solution. This 
c 
means that 
a 
Col(M) = b -—-a+b+c=0 
c 
b+e 
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The reduced matrix has two leading entries, one in each of the first two columns. This led to a 
basis containing two vectors. It turns out that there is a simpler method for finding a basis for the 
column space: Choose as basis vectors the matrix columns corresponding to reduced matrix columns 
containing leading entries. Now, if Col(M) = Span{aj, a2,..., @,}, then to form a basis, we need 
to find a maximally linearly independent subset. The following argument shows that the columns 
corresponding to the leading 1’s are such a set. 

Choose a to be the first basis element for Col(M). (If aj is all zeros, we just start with the first 
column that isn’t all zeros.) Since a is not the zero vector, {a1} is linearly independent. Now, we check 
to see if {a1, a2} is linearly independent. We can do this by solving for a in the equation aa; = a2. 
That is, we reduce the augmented matrix 


If the second column has a leading one, then that means there is a row with zeros to the left of the 
augment and a nonzero on the right of the augment. This would mean that the equation aa; = a2 has 
no solution and they are linearly independent. If there is no leading entry in the second column, then 
these columns are linearly dependent. 

Now, supposing that {a1, a2} is linearly independent, we check to see if a3 can be added to the set 
to form a new linearly independent set, {a1, a2, a3}. That means we want to solve for a and b in the 
equation aa, + baz = a3. This can be done by reducing the augmented matrix 


1 A2/A3 


If, after reducing, the third column has a leading entry, then {a 1, a3} is linearly independent and 
{a2, a3} is also linearly independent. That is, if there is a leading entry to the right of the augment, 
{a1, a2, a3} is linearly independent. If not, then either {a1, a3} or {a2, a3} is linearly dependent. 
We can continue this process of collecting linearly independent vectors by recognizing that the set 
of columns corresponding to a leading entry in the reduced matrix is a linearly independent set. So we 
choose them to be in the basis for Col(M). All other columns are in the span of these chosen vectors. 


Example 5.2.13 Let V = {ax? + bx? —ax+cl| a,b,c € R} and W=Mp?,3(R). Now, let us 
consider the transformation T : V — W defined by 


3 2_ = a a a 
T (ax? + bx ato=(.45 5, 4): 


Recall from Example 5.2.6, that we found a basis 


By = ix? =x, x", 1} 
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of V and the corresponding matrix representation is 


100 
100 
100 
oad 
-100 
0 -10 
To find Col(M), we find all w € R® so that 
1 0 0 
1 0 0 
e€ Span : D 2 
wer 1}°}1d-fo 
=| 0 0 
0 —1 0 


In other words, we find all w € R° so that there is a v € R° with Mv = w. In this example, it is clear 
that the last column is not part of a linearly independent set. Also, it is clear that the first two columns 
are linearly independent (they are not multiples of one another). Thus, a basis for the column space is 


1 
1 
1 
1 


orococ]e 


The Row Space of a Matrix 
Now, since anm x n matrix, M has rows that are vectors in R”, we can also introduce the subspace of R” 
that relates to the rows of the matrix. The definition of the row space is very similar to Definition 5.2.7. 


Definition 5.2.14 


Let M be anm x n matrix with rows p. a, Grd is € R”. The row space of M, denoted Row(M) 
is defined to be 


Row(M) = Span{(1, G2, ..., Bm}. 


Recall, for v € R", v! is the transpose of v € R” (see Definition 2.5.14). Given a matrix M, we find 
Row(M) by finding linear combinations of the rows. We can use the method of Spanning Set Reduction 
(see page 143) to find a basis for Row(/). Let us consider how this method can be employed. 
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Example 5.2.15 Let M be the matrix defined by 


111 
M=]2 1-1 
—-10 2 


Define ri, n : a to be the three rows, in order, of M. When performing matrix reduction techniques, 
we are finding linear combinations of the rows. Therefore, we can consider the reduced echelon form, 
M, of M found in Example 5.2.12, 
10-2 
M={01 3 
00 0 


As, we did in Example 5.2.12, define the rows of M tobe RI, R5, R}. We learn from this form (and 
the steps taken to get to this form) that 


Rij =—-rn+r2 
Ro =2r) — 12 
R3=—-—r, +1r2+73. 


From, these equations, we can see that the rows of the reduced echelon form are vectors in Row(M). 
We also see that, since R3 = (0 0 0)", the rows of M are linearly dependent. Indeed, 73 = r1 — r2. We 
see also that we have found two vectors, R;, Ro € Span{r1, 72, 73} that are linearly independent. We 
also know that, since 1, r2, 73 € Span{R1, Ro}, a basis for Row(M) is given by {R1, Ro}. In fact, this 
process will always lead to a basis for Row(/). 


It is important to note that dim(Row(M)) is the number of nonzero rows in the reduced echelon 
form. Putting this together with our process for finding Col(M), we see that the number of columns 
corresponding to leading entries in the reduced echelon form, M, is the same as the number of nonzero 
rows in M. That means that rank(M) is also equal to dim(Row(M)). 

Another method for finding a basis for the row space of a matrix is to consider the transpose, M', 
(see Definition 2.5.14) of the matrix M. 


Theorem 5.2.16 
Let M be an m x n matrix. Then Row(M) = Col(M") and Rank(M) = Rank(M'). 


Proof. Let M have rows or rs nee oe € R”. By Definitions 5.2.7 and 5.2.14, 
Row(M) = Span{r1, 2, ..., fm} = Col(M'). 


This means that Rank(M') = dim(Row(M)). 

Now, suppose c, €2,..., Cn € R” are the columns of M. We know that Rank(M) is the number 
of linearly independent columns of M. Suppose Rank(M) = k < m and suppose, without loss of 
generality, that cj, c2,..., cx are linearly independent. Then the augmented matrix corresponding to 
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the vector equations 


aycy + @2C2 +... + OKRCe =Ck+1 


ajc, + a2C2 +... + OKCe =CK+2 


ajcy +azc2 +... + Ace =Cy 


is given by 
| | I} || | 
C1 C2... CKICKLL Ck42.--- Cn 


This matrix has the same columns as M and has reduced echelon form with no leading entries on the 
right side of the augment. That is, the reduced echelon form has m — k rows of zeros at the bottom. 
In terms of the rows of M, we see that M is the matrix 


: 
> Te faa S 


T 
i 


| | | a 


Again, the reduced echelon form has k rows with leading entries and m — k rows of zeros. That is, 
there are k linearly independent rows in M. Therefore, Rank(M") = dim(Row(M)) = Rank(M). 


5.2.2. The lInvertible Matrix Theorem 


In this section, we explore the properties of matrix representations of linear transformations that are 
one-to-one and onto. In particular, we wish to know which matrix properties indicate the invertibility 
properties of the corresponding linear transformation. 

Recall that, if V and W are vector spaces, then we can discuss the invertibility of a linear 
transformation, T: V — W. We say that T is invertible if and only if there is another linear 
transformation, S:W-— V so that ToS=SoT=id, where id: V > V is the identity 
transformation. We will gather and add to theorems from previous sections to create the Invertible 
Matrix Theorem. 

Before getting deep into theorems, let us first define what we mean by the inverse of a matrix. It is 
important, as we connect linear transformations to matrices, to consider the connection to an invertible 
matrix and an invertible linear transformation. 


Definition 5.2.17 


Let M be areal n x n matrix. M is invertible if there exists a real n x n matrix, called the inverse 
of M and denoted M~!, such that MM~! = M-!M = 1,. 
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For a matrix to have an inverse it necessarily has the same number of rows and columns. We can 
understand this by noticing that the products AB and BA of two matrices A and B only make sense 
when the number of columns of A matches the number of rows of B and vice versa. 

The next example demonstrates a method for finding the inverse of a matrix that will be useful in 
understanding the theorems that follow. 


Example 5.2.18 Let A be the 3 x 3 matrix given by 
121 
A=]111 
133 


If A has an inverse, then there is a3 x 3 matrix B so that AB = BA = J. That is, we can find 


abe 
B=j|def], 
ghi 
so that 
121 abe 100 
AB=]111 def})={010 
133 ghi 001 
and 
abc 121 100 
BA=|def 111}/=]010 
ghi 133 001 


In Section 3.1, we saw that a matrix product can be rewritten as a linear combination of the columns. 
Suppose a1, @2, a3 are the columns of A, then we can rewrite the columns of AB as 


aa, + dar + ga3 =e 
ba, + ea2 + ha3 =e2, 


cay + far + i103 = 6 


where ej, €2, €3 are the columns of /3. Another way to write this is as matrix equations 


a 1 1 Cc 0 
Ajd|=]0],A/le]=]0], andA[{[ f] =|0 
g 0 h 0 i 1 
a 
We could solve the first of these three equations for the first column vector | d | of B by row 
& 
reducing the augmented matrix 
12 1/1 
11 1/0 


13 3/0 
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We could then solve for the other columns of B similarly. However, we would have repeated essentially 
the same row reduction computation three times. To simultaneously solve these three equations, we 
can use the method of matrix reduction on the augmented matrix 


121)100 
111/010 
133/001 


If we reduce this and the result is /3 on the left side of the augment, then we have exactly one 
solution, B, to the matrix equation AB = J3. This matrix B is the inverse. In fact, we reduce the above 


and get 

3 1 
100) 0 5-5 
010; 1-1 O 
1 1 
OOlj-1 5 35 

Therefore, the inverse, B is given by 
3 1 
0 5-3 
B= 1-1 0 
1 1 
“ly 3 


One can easily check that, indeed, AB = [3 and BA = fh. 


This example suggests that a matrix must be row equivalent to the identity matrix in order for it to 
be invertible. Another useful result that is suggested by the above example is the relationship between 
the determinants det(M) and det(M—!) whenever M~! exists. We present the result in the following 
theorem. 


Theorem 5.2.19 
Let M be ann x n invertible matrix. Then, det(M) det(M =) = Il, 


The proof of Theorem 5.2.19 follows from the definition of the determinant. In particular, we see in 
Example 5.2.18, that we can compute the inverse of M by reducing M to I while at the same time, using 
the same row operations, taking J to M~!. This means that we “undo” the determinant calculation for 
M to get the determinant calculation for M~!. See Exercise 2 to try an example of this. 

In addition, we can list many properties related to when a matrix is invertible. In fact, we have 
already proven many results (in Sections 2.2, 3.1, 3.2, and 3.3) that characterize invertibility. We 
combine these theorems to summarize facts about matrix invertibility in the following theorem, known 
as the Invertible Matrix Theorem. If you completed and proved Theorems 3.1.34, 3.3.20, and 3.4.40 
from exercises in previous sections, you will have most of this theorem, but in case you did not, we 
complete it here. 
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Theorem 5.2.20 ((Partial) Invertible Matrix Theorem) 
Let M € Mpyxn(R) with columns c),c2,...,c, € R” and rows ee ae ones ie Then the 
following statements are equivalent. 


(a) M is invertible. 

(b) The reduced echelon form of M has n leading ones. 

(c) The matrix equation Mx = 0 has a unique solution. 

(d) For every b € R”, the matrix equation Mx = b has a unique solution. 
(e) The homogeneous system of equations with coefficient matrix M has a unique solution. 
(f) The set {c1, c2,..., Cn} is linearly independent. 

(g) The set {cj, c2,..., Cn} is a basis for R”. 

(h) det(M) 4 0. 

(i) Rank(M) =n. 

G) Nullity(7) = 0. 

(k) {r1,72,..-, 7} is linearly independent. 

() {r1,7r2,..., 1} is a basis for R”. 
(m) M is invertible. 


To prove Theorem 5.2.20, we will employ results from previous sections. 


Proof. To show these statements are equivalent, we first list the equivalencies we have already 
established in theorems from previous sections. 


By Theorem 2.2.20, (c) <=> (b). 
By Corollary 3.1.32, (c) << > (d). 
By Theorem 3.1.22, (c) <=> (e). 
By Theorem 3.3.14, (d) <> (f). 
By Corollary 3.4.36, (d) <=> (g). 
By Theorem 4.5.20, (c) <=> (h). 


The above equivalencies show that statements (b)—(h) are all equivalent. 

Next, we show that (i) is equivalent to (g). By Definition 3.4.3, if (g) is true then {c1, c2,..., Cn} is 
linearly independent. So, by Definition 5.2.7, {c1, c2,..., Cn} is a basis for Col(M) and Rank(M) = n. 
That is, (i) is true. Now, assume (i), we know that there are n columns of M, and by Definition 5.2.7, we 
know that these columns span an n dimensional subspace of R”. That is, the columns span R”. If the 
columns form a linearly dependent set, then we would be able to find a spanning set of R” with fewer 
than n elements. Theorem 3.4.20 tells us that this is impossible. So, we know that the columns of M@ 
form a basis for R”. But, since Span{cj, cz, ..., Cn} = Col(M) we have that Col(M) = R”. Therefore 
(i) is equivalent to (g). 

The Rank-Nullity Theorem tells us that Nullity(/) = n — Rank(M), so (i) is equivalent to (j). 

Now, to show (a) is equivalent to statements (b)—(j), we show that (a) implies (c) and (b) implies 
(a). Because (c) is equivalent to (b), we will be able to conclude that (a) is also equivalent to all other 
statements in the theorem. If M is invertible, then there is a matrix MW —! sothat M—!M = 1 n- Lherefore, 
if Mx = 0, then 

x =1,x = M~'Mx = M~'0=0. 


That is, Mx = 0 has only the trivial solution. 
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Now, suppose that the reduced echelon form of M has n leading entries. Then, when we reduce the 
augmented matrix (M | Ti) to reduced echelon form we get a leading | in every column (and therefore 
every row) to the left of the augment. That is, the reduced echelon form of (M | in) is a matrix of the 
form Un | A), where A is ann X n matrix whose columns satisfy 


Ma, =e, 
Maz =e2 
May =ey.- 


That is, 


That is, MA = I,. Reversing the matrix reduction steps used to reduce (M | In) to hh | A) are the same 
steps one would take to reduce (A | In) to the reduced echelon form, Un |_ ie This means that a similar 
argument tells us that AM = J,. That is, A= M7! and M is invertible. Thus, (a) is equivalent to 
statements (b)—(j). 
Finally, Theorem 5.2.16 tells us that Rank(M') = Rank(M). Therefore, we have (i) is equivalent 
to (m). Since rj, 2, ..., fn are the columns of M", we have (k) and (1) are both equivalent to (m). 
Thus, statements (a)—(m) are all equivalent. 


This theorem can be used to test a matrix for invertibility or to determine whether any of the 
equivalent statements are true. For example, we can use it to test whether the columns of a matrix span 
IR”, or we can use it to determine whether the nullspace of a matrix is {0}. It tells us that for a square 
matrix, either all statements are true or all statements are false. We will add more equivalent statements 
to the invertible matrix theorem in later sections. 

Now that we have characterized when a matrix is invertible, we want to use this characterization to 
say something about linear transformations. In the next theorem, we consider the relationship between 
the invertibility of a linear transformation and the invertibility of its matrix representation. 


Theorem 5.2.21 
Let V and W be finite-dimensional vector spaces and T : V > W be linear. Let By and By be 


ordered bases for V and W, respectively. Also let M = ie Then the following statements 
are equivalent. 


(a) M is invertible. 
(b) T is IEE le 
(©) (T7'lg, = MT. 


Proof. Suppose V and W are vector spaces with dimension n and m, respectively. By Definition 5.2.17 
and Theorem 4.7.49, we know that if any of (a)—(c) are true, then V = W. So we know that m = n. 
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Assume, also, that V and W have ordered bases By and By, respectively. Let T : V — W be a linear 
transformation with matrix representation M. That is, [T(v)]s, = M[v]g, Define 7, : RR" > V to 
be the transformation that maps each vector v € V to its coordinate vector [v]g, € R”, Tz : R” > R" 
to be the transformation defined by multiplying vectors in R” by the matrix M, and 73: R” > W to 
be the transformation that maps coordinate vectors [w],, € IR” to their corresponding vector w € W. 
(See Figure 4.16.) That is, for v € V, w € W, and x € R”, 


TM(v) = [v]p,, T2(x) = Mx, and 73([w]g,,) = w. 


We showed, in Theorems 4.7.43 and 4.7.49, that 7; and 73 are isomorphisms and that te and (a 
exist. We will show 


(a) => () => (©) => @. 


((a) => (b)): Suppose M is invertible. Then, by definition, M~! exists. Define S) : R” + R” (as 
in Figure 4.16) for y € R”, by 
So(y) = My. 


For x, y € R? 
(Sz 0 Ta)(x) = Sp(To(x)) = S2(Mx) = M7!Mx = Ix = x 


and 


(Tr 0 So)(y) = T(S2(y)) = T(M7!y) = MM~y = Ihy = y. 


Therefore, S2 = i? Now, T = 73072,0T;. (Here, we have employed Lemma 4.4.4 and 
Definition 4.4.7.) Define § = ie fo) i fo) aa and for v € V and w € W with T(v) = w, we have 


(SoT)(v) =(1, 'oT, 'oT; ‘)oT) 
Sf dO a) 
aT Us Oy a) 
=T; | (Ty (wy) 
=T; '(M~'[w] By) 
=T; | ((v]py) 
= 

and 

(T 0 S)(w) =T(S(w)) 
f(t (5 (iy ty) 
=T(T, Us ‘(wlg,))) 
=T(T;'(M~'[w]By)) 
=T (T;'(tv]py)) 
=T(v) 


=W. 


Therefore, S = T~! and so, T is invertible. 
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((b) = > (c)): In proving (a) implies (b), we found Eo : R” — R’, defined as multiplication by 
the matrix M~—!. And, if T(v) = w then 


M{v]g, = [w]g,, and T~'(w) = v. 


So 
M~'[w]p,, = vl, = (77! (w) |p. 


then, by Lemma 4.4.4 and Definition 4.4.7, M~! is the matrix representation of T~!. 

((c) => (a)): Finally, suppose that T is invertible. Then, we know that the linear transformation 
T~! exists. That means that (T o T~!)(w) = w for all w € W and (T~! 0 T)(v) = v forall ve V. 
Suppose that T(v) = w. Then T~!(w) = v. We also know that there is a matrix A so that 


[vls, =([T~'Ww)ls, = Alwley- 


We will show that AM = MA = I). Indeed, we have, for any x, y € R”, there is v € V and w € W 
so that x = [v]g, and y = [w]g,,. Therefore, 


AMx = AM[v]py 
= A[T(v) By 
= A[w]By 
= [v]B, 


=2; 
We also have for any y € R”, there is 


MAy = MAlw By 
= M(T(w)|py 
= M[v]p, 
= [w]By 
=y. 


Therefore MA = AM = I. That is, M is invertible with A = M—! and iP = M—!, 


Theorems 5.2.20 and 5.2.21 tell us other equivalent statements about T such as “T is one-to-one” 
and “7 is onto” (see Exercises 57 and 58 of Section 4.7). Theorem 5.2.21 not only tells us that M is 
invertible if and only if T is invertible but also provides a method for finding the matrix representation 
of the inverse transformation T~!. 


Example 5.2.22 Consider the vector space of functions defined by 
pe {f) = ce3* 4 dcos2x + bsin2x |c,d,b € R| 


over the field R. Let T: F — F be the linear transformation defined by T(f (x)) = f’(x). Let a = 
{e* , cos 2x, sin 2x}, a basis for F. Theorem 5.2.21 guarantees that the inverse matrix M —! exists if 
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and only if T~! exists. The inverse transformation is defined by 

T~(re7** +5 sin2x +t cos 2x) = —jre* + zt sin 2x — 5s cos 2x, 
for scalars r,s, t € IR. The reader can verify that this definition does provide the inverse transformation 


by confirming that T~!(T(a)) = a and that T(T~!(a)) = a. 
So now, we can find this matrix directly from T~!. 


The reader can verify that MWM~! = M-!'M =]. 


In this section, we explore the properties of matrix representations of linear transformations that are 
one-to-one and onto. In particular, we wish to know what matrix properties indicate the invertibility 
properties of the corresponding linear transformation. 


Theorem 5.2.23 
Let T: V — W be linear, dim V =n and dimW=m. Let M= age be the matrix 
representation of T relative to domain basis By and codomain basis By. Then 


(a) T is onto if, and only if, the rows of M form a linearly independent set in R”. 
(b) T is one-to-one if, and only if, the columns of M form a linearly independent set in R”. 


Proof. Let V and W be vector spaces with dim V = n and dim W = m. Define T: V > W to bea 
linear transformation. Let M = ire be the matrix representation of T relative to domain basis By 
and codomain basis By. 

We first consider the map 7> : R” > R” defined by T(x) = Mx first discussed in Section 4.4. 
Since the coordinate maps 7; : V > R” and 73 : R” — W (see Figure 4.16) are isomorphisms, then 
we know that T is onto if and only if 7 is onto and T is one-to-one if and only if 72 is one-to-one. 
Thus, it is sufficient to show that 7> is onto if and only if the rows of M form a linearly independent 
set in R” and that T> is one-to-one if and only if the columns of M form a linearly independent set in 
R”. 

We know that 7> is onto if and only if for all b € R”, T.(x) = Mx = b has a solution. But we also 
know that Mx = b has a solution for all b € R” if and only if the reduced echelon form of M has a 
leading one in every row. This happens if and only if no row of M can be written as a linear combination 
of other rows of M. That is, T is onto if and only if the rows of M are linearly independent. 

Also, T> is one-to-one if and only if 7(x) = 72(y) implies x = y; 1.e., if and only if 7o(x) = 0 
implies x = 0. We know that T(x) = Mx = 0 has a unique solution (the trivial solution) if and only 
if the columns of M are linearly independent. Therefore, T is one-to-one if and only if the columns of 
M are linearly independent. 
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Example 5.2.24 Consider the linear transformation T: ?3(R) > P2(R) defined by T(f(x)) = 
f'(x). Let a = {x3, x7, x, 1} and G = {x2, x, 1}. Then 


3000 
M=(T/2 =|[0200 
0010 


The rows of M form a linearly independent set, but the columns of M form a linearly dependent set. 
Theorem 5.2.23 tells us that T must be onto and is not one-to-one. Let’s verify these properties. 

We know that T is onto if Tu = w has a solution u € ?3(R) for every w € P2(R). Let w = ax? + 
bx + c for arbitrary scalars a,b,c € R. Then v = ax? + 5 bx? +cx +d isasolution to Tv = w for 
anyd eR. 

We know that T is not one-to-one if there exists u € P2(R), u #0, such that Tu = 0. Let u = 
Ox? + Ox +d for any nonzero scalar d € R. Then, Tu = u’(x) = 0. Thus, T is not one-to-one. 


Let us continue connecting invertibility of a linear transformation and invertibility of the corresponding 
matrix representation. 


Corollary 5.2.25 
Let T: V > W be linear, dimV =n and dimW =m. Let M=[T]," be the matrix 
representation of T relative to domain basis By and codomain basis By. Then T is invertible if, 


and only if, the columns of M form a basis for IR” and the rows of M form a basis for R”. 


Proof. Let V and W be vector spaces with dim V = n and dim W = m. We know that T is invertible 
if and only if V = W. That is, m = n. We also know, by Theorems 5.2.20 and 5.2.21, that 


T isinvertible <=> M is invertible 
<= Thecolumns of M form a basis for R” 
<= The rows of M form a basis for R”. 


Example 5.2.26 Recall the transformation given in Example 5.2.22 was invertible. The matrix for the 


transformation is 
—30 0 


M=[T]la=]| 0 0-2 
020 


The rows of M form a linearly independent set, and the columns of M form a linearly independent set. 
Corollary 5.2.25 confirms the invertibility of T. 


xx Watch Your Language! The common practice of describing both transformations and matrices as 
“invertible” and having an “inverse.” However, it is not appropriate to use the same language for 
both. Acceptable language for discussing properties of transformations and its corresponding matrix 
representation is provided here. 
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Y The transformation T is one-to-one. 

¥ The matrix representation M has linearly independent rows. 

Y The transformation T is onto. 

¥ The matrix representation M has linearly independent columns. 
v The transformation T is an isomorphism. 

v The matrix representation is row equivalent to the identity matrix. 
v T is invertible. 

¥ Misinvertible. 

v Thasaninverse, T~!. 

vY Mhasaninverse, M~!. 


It is not accepted practice to say 


X The transformation T has linearly independent rows. 

X The matrix representation M is one-to-one. 

X The transformation T has linearly independent columns. 
X The matrix representation M is onto. 

X The transformation T can be reduced to the identity. 

X The matrix representation is an isomorphism. 


= Path to New Applications 

Linear programming is a tool of optimization. Solving systems of equations and matrix equations 
are necessary tools for finding basic solutions. To determine whether or not there is a basic 
solution, researchers refer to the Invertible Matrix Theorem (Theorem 7.3.16). See Section 8.3.3 
to learn more about how to connect linear programming to linear algebra tools. 


5.2.3 Exercises 


1. For each of the following matrices, find Null(M), Col(M), Rank(M), Nullity(/), size(M), the 
number of columns without leading entries, and the number of leading entries in the echelon form. 


123-1 
lie 1 
@) M1535 9 
56-1-5 


12 3 -1 0 
13-1-1 2 
33-1 -2-1 


101 
{ 1=1 
(c) M=] 222 
3 14 


-10 1 
3000 
0200 
0110 


D2 


Nm BW 
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. Verify by example, using the matrices below, that Theorem 5.2.19 is true. That is, show 


Theorem 5.2.19 is true by finding M —l det(M), and det(M~!) using matrix reduction. This 
is not a proof, but the proof is very similar. 


11 
2-1 
w= (22 


(c) M= c ) (Assume ad — bc £ 0.) 


(24 
(d) M={111 
011 
2 Oi 
(ec) M=|-121 
112 


. How does nullity(Z) show up in the echelon form of the matrix reduction? 

. How does rank(M) show up in the echelon form of the matrix reduction? 

. How are dim V and dim W related to M? 

. Use the Rank-Nullity Theorem to make a conjecture that brings together a relationship with all or 


some of the answers to Exercises 3, 4, and 5. 


. The invertible matrix theorem is an important theorem. Fill in the blanks or circle the correct 


answer below to complete the statement of the theorem. 


(a) AX = b has a unique solution if 

(Choose one: A is invertible or A is not invertible). 
(b) A is invertible if and only if det(A) 
(c) AX = b has a unique solution if det(A) 


. Explain why the columns of M may not form a basis for Col(M) in Definition 5.2.7. 
. Prove Corollary 5.2.3. 

. Prove Theorem 5.2.8. 

. Prove Theorem 5.2.9. 

. Prove Corollary 5.2.10. 


Which of the following statements are equivalent to those given in the Invertible Matrix Theorem? 


Prove or disprove the equivalency of each statement. Use the conditions and notation given in 
Theorems 5.2.20 and 5.2.21. 


13. 
14. 
15. 
16. 
17. 
18. 
19. 
20. 
21. 
22. 


M is one-to-one. 

The reduced echelon form of M is the identity matrix /,. 

Ran(7T) = R”. 

Mx = b has at least one solution x € R” for each b € R”. 

M? is invertible. 

Null(M) = {0}. 

tard = MT, 

If {bj, bo, ..., by} is a basis for R” then {Mb,, Mhz, ..., Mb,} is also a basis for R”. 
If {b1, b2,..., bn} is a basis for V then {7 b,, Tb2,..., Tby} is a basis for W. 
Explain why the nullspace of a matrix can be viewed as the solution set of a homogeneous system 
of equations. 
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5.3 Exploration: Reconstruction Without an Inverse 


In this exploration, we construct and use left-inverse transformations for linear transformations which 
are one-to-one, but not necessarily an isomorphism. 

Suppose we have a radiographic transformation T : O — FR, where O is the object space with 
ordered basis Be and FR is the radiograph space with ordered basis Bz. For any object vector x € O and 
corresponding radiograph b € 7k, we have the equivalent matrix representation bales [x]Bo = [PlBe- 
We will consider transformations T for which a left-inverse transformation S :R — O exists (see 
Definition 4.7.53). S has the matrix representation [S i. The existence of § allows us to recover an 
object from radiographic data: 


[Sige Wise = Sige (7s ls = eo las =e ees 


In order to simplify the notation for this section, we will write Tx = b and Sb= S(Tx) = Ix =x 
with the understanding that we are working in the coordinate spaces relative to the bases Bo and Br. 


5.3.1 Transpose of a Matrix 


To begin our study, we recall Definition 2.5.14, the definition of the transpose of a matrix. Here, we 
restate the definition more concisely. We also provide some properties of the transpose. 


Definition 5.3.1 


The transpose an n x m matrix A, denoted A‘, is the m x n matrix formed by interchanging the 
columns and rows. That is, (A’);,j = Ajj foralll <i <nand1 <j <™m. 


Theorem 5.3.2 
Properties of the transpose. Let A and B be m x n matrices and C ann x k matrix. 


1 (ATH A. 

2. If D=A!, then D' = A. 
3. (AC) =CTAl, 

4. (A+ B)'=Al+ B’. 


Proof. In this proof, let indices i and j be arbitrary over their expected ranges, and let A;; be the entry 
of A in the i’” row and j“” column. In each case, we show that two matrices X and Y are equal by 
showing that arbitrary entries are equal: Xj; = Y;;. 


1. ((A'))jj = (A) ji = Aij, 80 (AT) = A. 

2. Suppose D = A, Then, by (1), A = (AT) = D!, _ 

3. Let A=A™ and C=CT. (AC); = (AC) ji = Whe ajeces = Wh Gedy = (CA); 
= (C'A!),;. Thus, (AC)' = CTA, 
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4. (A+ B)')ij = (A+ B)ji = Aji + Byi = (A™)ij + (Bij = (AT + Bij. Thus, (A+B) = 


A'+B'. 


5.3.2 Invertible Transformation 


Now consider the following example: We are given a radiograph with M = 24 pixels that was created 
by applying some radiographic transformation, T to an object with N = 16 voxels. 


1. 


Give a scenario for a radiographic transformation T that fits the above example. Don’t calculate a 
T, rather give the following: 


— Size of the object: x 
— Number of pixels per view: 
— Number of views: 


Suppose we know b and we want to find x. This means that we want 
x=T'b. 


(a) What properties must the transformation T have so that T is invertible? 

(b) What properties must the transformation T have so that a left-inverse of T exists? 

(c) What matrix properties must the matrix representation of T have so that it is invertible? 

(d) When N < M (asin the example above), what matrix properties must the matrix representation 
of T have so that it has a left-inverse? 


. For ease of notation, when the bases are understood, we often use the same notation for the matrix 


and the transformation, that is, we let T represent both the transformation and its associated matrix. 
Suppose, NV < M anda left-inverse, P of T exists. This means that x = Pb. If T were invertible, 
we would have P = T~!. But, in the example above, we know that T is not invertible. Using the 
following steps, find the left-inverse of T. 


(a) Because Tx = b, for any linear transformation (matrix) A, we can write ATx = Ab. This is 
helpful if AT is invertible. Since T is one-to-one, we know that for AT to be invertible, the 
only vector in Ran(T) that can be in Null(A) is the zero vector. What other properties must A 
have so that AT is invertible? 

(b) Provide a matrix, A so that A has the properties you listed in 3a and so that AT is invertible. 

(c) Solve for x in the matrix equation ATx = Ab using the A you found and provide a 
representation of the left-inverse of P. 


4. Putting this all together now, state the necessary and sufficient condition for T to have a left-inverse? 


5.3.3 Application to a Small Example 


Consider the following radiographic example. 


e Total number of voxels: N = 16 (n = 4). 
e Total number of pixels: M = 24 
e ScaleFac=1 
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e Number of views: a = 6 
e View angles: 6; = 0°, 62 = 20°, 63 = 40°, 64 = 60°, 05 = 80°, 45 = 100°. 


5. Use tomomap.m to compute T and verify that the left-inverse of T must exist. The function 
tomomap returns a transformation matrix in sparse format. To use and view as a full matrix array 
use the command T=full(T); after constructing T. 

6. Compute the left-inverse P. Use P to find the object that created the following radiograph vector 
(You should be able to copy and paste this into OCTAVE or MATLAB. 


b=[0.00000 
32.00000 
32.00000 
0.00000 
1.97552 
30.02448 
30.02448 
1.97552 
2.71552 
29 .28448 
29.28448 
2.71552 
2.47520 
29.52480 
29.52480 
2.47520 
1.17456 
30.82544 
30.82544 
1.17456 
1.17456 
30.82544 
30.82544 
1.17456] 


5.3.4 Application to Brain Reconstruction 
Now, we can reconstruct some brain images from radiographic data. This section will guide you in this 
process. 

7. Collect the following necessary files and place them in your working OCTAVE/MATLAB directory. 


(a) Data File: Lab5radiographs.mat 
(b) Plotting Script: ShowSlices.m 
(c) OCTAVE/MATLAB Code: tomomap.m 


8. Choose whether to create a new script file (“.m’” file) for all of your commands or to work at the 
OCTAVE/MATLAB prompt. 
9. Load the provided radiographic data with the following command. 


load Lab5Sradiographs.mat 


5.3 


10. 


11. 


12. 


13. 


14. 
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This command loads a variable named B that is a 12960x362 array. Each column is a radiograph 
vector corresponding to one horizontal slice of the human head. Each radiograph has 12960 total 
pixels spread across many views. The first 181 columns are noiseless radiographs, and the last 181 
columns are corresponding radiographs with a small amount of noise added. 

Use the familiar function tomomap.m to construct the transformation operator T corresponding 
to the following scenario (which is the scenario under which the radiographic data was obtained): 
n = 108, m = 108, Scale Fac = 1, and 120 view angles: the first at 1°, the last at 179°, and the 
rest equally spaced in between (hint: use the linspace command). 

Some OCTAVE/MATLAB functions do not work with sparse arrays (such as our 7’). So, just make 
T a full array with this command: 


T=full(T); 


It is tempting to compute the one-sided inverse P as found in (3c). However, such a large matrix 
takes time to compute and too much memory for storage. Instead, we can use a more efficient 
solver provided by OCTAVE/MATLAB. If we seek a solution to Lx = b, for invertible matrix L, 
we find the unique solution by finding L~! and then multiplying it by b. OCTAVE/MATLAB does 
both operations together in an efficient way (by not actually computing L~') when we use the 
command x=L\b. 

As a first test, we will try this with the 50th radiograph in the matrix B. That is, we will reconstruct 
the slice of the brain that produced the radiograph that is represented in the 50th column of B. We 
want to solve the equation found in (3c): AT x = Ab using the A matrix which you found in (3b). 


b=B(:,50); 
x=(A*T) \ (A*b) ; 


The vector x is the coordinate vector for our first brain slice reconstruction. To view this 
reconstruction, use the following commands 


figure; 
x=reshape(x,108,108) ; 
imagesc (x, [0,255]); 


The reshape command is necessary above because the result x is a (108 - 108) x 1 vector, but the 
object is a 108 x 108 image. 

Notice also that x and b could be matrices, say X and B. In this case, each column of X is the 
unique solution (reconstruction) for the corresponding column of B. Use these ideas to reconstruct 
all 362 brain slices using a single OCTAVE/MATLAB command. 

Use the variable name X for the results. Make sure that X is an 11664x362 array. Now, the first 
181 columns are reconstructions from noiseless data, and the last 181 columns are reconstructions 
from noisy data. 

Run the script file ShowSlices.m that takes the variable X and plots example slice reconstructions. 
Open ShowSlices.m in an editor and observe the line 


slices=[50 90 130]; 


Any three slices can be chosen to plot by changing the slice numbers. In the figure, the left column 
of images are reconstructions from noiseless data, the right column of images are reconstructions 
from the corresponding noisy data. IMPORTANT: Once X is computed it is only necessary to 
run ShowSlices.m to view different slices; running the other commands described above is time 
consuming and unnecessary. 
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Congratulations! You have just performed your first brain scan tomography. Using your new tool, 
answer the following questions. 


15. Are the reconstructions from the noiseless radiographs exact discrete representations of the brain? 
Based on the concepts you have learned about matrix equations, explain why the reconstructions 
are exact or why not? 

16. Determine a way to find the relative amount of noise in the noisy radiographs. Remember that the 
noisy radiograph corresponding to the radiograph represented in the 50’” column of B is in the 
(181 + 50)’" column of B. There are many ways to answer this question, so be creative. Is there 
a lot or a little noise in the radiographs? 

17. Are the reconstructions from the noisy radiographs exact representations of the brain? 

18. Compare the degree of “noisiness” in the noisy radiographs to the degree of “noisiness” in the 
corresponding reconstructions. Draw some conclusion about what this comparison tells us about 
the practicality of this method for brain scan tomography. 


® 


Check for 
updates 


Diagonalization 


In studying the behavior of heat states, we used a (linear) heat diffusion transformation E. Given a 
heat state u(t) at time f, the heat state at a later time ¢t + At is given by u(t + At) = Eu(t). But, this 
works well only if the time step At is sufficiently small. So, in order to describe the evolution of heat 
states over a longer period of time, the heat diffusion transformation must be applied many times. For 
example, in Figure 6.1, u(3000A‘?) requires 3000 applications of FE’. That is, 


uB000At) = E(E(--- E(u(t))-+-)) = E>? u(t). 


Performing such repeated operations can be tedious, time-consuming, and numerically unstable. We 
need to find a more robust and practical method for exploring evolutionary transformations that are 
described by repeated operations of a single transformation. 

The key tool in this exploration is the idea that transformations (and vectors) can be represented in 
terms of any basis we choose for the vector space of interest. We will find that not all representations 
are useful, but some are surprisingly simplifying. Using our findings, we will be able to study different 
aspects of evolutionary transformations including basis interpretation, independent state evolution, 
long-term state behavior, and even new findings on invertibility. 


temperature (u) 


_ = 


position (x) 


Fig. 6.1 In finding each of the heat states in this figure, we must apply repeated applications of E to the initial heat 
state, u(0). 
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6.1 Exploration: Heat State Evolution 


In the exercises of Section 4.3, we explored two ways to compute the heat state along a rod after k 
time steps. We found the matrix representation, EF, for the heat diffusion transformation, relative to the 
standard basis. We considered raising E to a large power, k, followed by multiplying by the (coordinate 
vector for the) initial heat state: 

u(kAt) = E*u(0). 


We also considered iteratively multiplying the (coordinate vector for the) current heat state by E: 
u(At) = Eu(O), u(2At) = Eu(At),..., u(kAt) = Eu((k — 1)Ar). 


In both scenarios, we found that the results became more and more computationally cumbersome. In 
this section, we will explore heat states for which multiplying by E is not cumbersome. 


> Throughout the discussion that follows it will be important to pay attention to the basis in which 
coordinate vectors are expressed. 


The following tasks will lead the reader through this exploration. 


1. In the following picture, there are 12 different initial heat states (in orange) and their corresponding 
diffusions (colors varying from orange to dark blue). Group the pictures based on the diffusion 
behavior. We are not interested in comparing particular snapshots in time (for example, comparing 
all of the greed curves). Rather, we are interested in features of how each state evolves from orange 
to blue. Briefly list the criteria you used to group them. As we continue this exploration, a clear 
description of your chosen criteria will be helpful. 


To continue the exploration, the reader needs to recognize the common feature among heat state 
evolution for heat states 3, 4, and 8. A hint is provided in the first paragraph of the next chapter, but we 
encourage the readers to attempt to recognize this feature without reading the hint. We will continue 
this exploration with this particular feature. 


2. Write the expression for Eu for the special vectors sharing the features you identified for heat 
states 3, 4, and 8. 

3. Now view the diffusion of linear combinations of these special vectors. Use the MATLAB/OCTAVE 
code DiffuseLinearCombination.m. What do you see in the diffusion of a vector that is a linear 
combination of 


(a) Two of these special vectors? 
(b) Three of these special vectors? 


4. Write out algebraically what happens in the diffusion of a heat state that is a linear combination 
of these special vectors. 

5. What if we want to find more of these special vectors? What matrix equation would we solve? 

6. What do this equation and the invertible matrix theorem tell us? 
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Exercises 


Recall that our goal is to find the heat state after k time steps of the heat diffusion. We have observed that 
some heat states (special vectors) only change in amplitude with time. We see this when considering 
the matrix representation E and special vectors v; for which Ev; = ajv;. Now we will explore what 


332 6 Diagonalization 


could happen if some set of special vectors 3 = {v1, v2,--+ , Um} is a basis for R”. In the following 
exercises, assume that, for the vector space of heat states, there is a basis made up entirely of special 
vectors. 


1. How can we write the heat state u(t) using the basis 3? Write this expression for u(t). 

2. Recall that when working within vector spaces, we need to know with respect to which basis we are 
working. When we write that 6 = {v1, v2, ..., Um}, we are considering the vectors v1, v2, ..., Um 
as coordinate vectors with respect to the standard basis. Is the expression in Exercise | written as 
a coordinate vector with respect to the standard basis or with respect to 3? 

3. Using your expression in Exercise |, write an expression for u(t + At) by multiplying by the 
diffusion matrix E. Are you working with respect to the standard basis or with respect to 3? 

4. Use your answer from Question | to find a representation for the heat state after the 10th iteration 
of diffusion, that is, for u(t + 10Ar). Are you working with respect to the standard basis or with 
respect to 3? 

5. Write u(t) as a coordinate vector with respect to the basis you did not choose in Exercise 2. For 
simplicity, let’s call this new coordinate vector w(t). 

6. Using the notation from Question 5, we can write w(t + Ar) as the coordinate vector (with respect 
to the same basis as in Question 5) for the next heat state. Since E makes sense as the diffusion 
only when multiplying by a coordinate vector with respect to the standard basis, we cannot just 
multiply w(t) by E. Discuss possible ways to find the coordinate vector w(t + Af). 

7. Using your answer from Question 6, find a representation for the heat state after the 10th iteration 
of the diffusion. 

8. How might these computations using the basis / give us information about the long-term behavior 
of the heat diffusion? State your observations. 


6.2 Eigenspaces and Diagonalizable Transformations 


As we explored how heat states evolve under the action of a diffusion transformation FE, we found that 
some heat states only change in amplitude. In other words, applying the diffusion transformation to 
one of these heat states results in a scalar multiple of the original heat state. Mathematically, we write 


Ev=.0, (6.1) 


for some scalar \ and for one of these special heat states v € 7, (IR). In the last section we saw that it 
is easy to predict how one of these special heat states will diffuse many time steps in the future, which 
makes them of particular interest in our study of diffusion. 

In the Section 6.1 we also rewrote equation (6.3) as the matrix equation 


(E—X1)v = 0. (6.2) 


Since this is a homogeneous equation, we know it has a solution. This means that either there is a 
unique solution (only the trivial solution v = 0) or infinitely many solutions. If we begin with the zero 
heat state (all temperatures are the same everywhere along the rod) then the diffusion is trivial because 
nothing happens. The special vectors we seek are nonzero vectors satisfying the matrix equation (6.2). 
In order to do this we need to find values of so that the matrix (E — AJ) has a nontrivial nullspace. 

We also observed previously that, since the diffusion transformation is linear, we can readily predict 
the diffusion of any linear combination of our special heat states. This means that a basis of these 
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special heat states is particularly desirable. If 6 = {v1, v2,..., Um} is a basis of these special vectors 
so that Ev; = A; v; and u(0) is our initial heat state, we can write u(0) in coordinates relative to LB. 
That is, we can find scalars a1, a2, ...Qm So that 


u(O) = ayvy +Q2VU2 +... + Ann. 


Then, when we apply the diffusion operator to find the heat state, w(At), one time step (Ar) later, we 
get 


u(At) = Eu(0) 
= E(qyvy + a2v2 +... + AnUm) 
ayEvy tagEv2+...+ anEvm 


= ajA v1 + a2A2U2 +... + AmAmIVm- 


Now, using this same idea, we can find u(k At) for k time steps later (far into the future). We get 


E*u(0) 


= a, E* yy, + arE* vy +...4+ Om E* Um 


u(kAt) 


k k k 
ayA; V4 + a2.A5 V2 Se i Amn Um: 


This equation requires no matrix multiplication and u(k At) can be computed directly without the need 
for computing each intermediate time-step result. Because of this, we can easily predict the long-term 
behavior of this diffusion in a way that is also computationally efficient. 


6.2.1. Eigenvectors and Eigenvalues 


Nonzero vectors satisfying equation (6.1) are important in both of the main applications (diffusion 
welding and radiography/tomography) in this text. Such vectors also arise in important ways in many 
other linear algebra concepts and applications. We assign the following terminology to these vectors 
and their corresponding scalars. 


Definition 6.2.1 


Let V be a vector space and L : V > V a linear transformation. If for some scalar ) and some 
nonzero vector v € V, L(v) = Av, then we say vu is an eigenvector of L with eigenvalue \. 


As with heat states, we see that eigenvectors of a linear transformation only change amplitude (and 
possibly sign) when they are mapped by the transformation. This makes repetitive applications of a 
linear transformation on its eigenvectors very simple. 


> Important Note: Throughout our discussion of eigenvectors and eigenvalues we will assume that 
L is a linear transformation on the n-dimensional vector space V. That is, L : V — V is linear. We 
will also let M be a matrix representation of L relative to some basis. The reader should be aware of 
whether we are working directly in the vector space V or whether we have fixed a basis and hence 
are working in the associated coordinate space. 
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Example 6.2.2 Consider the linear transformation L : R? > R? defined by L(x) = Mx where M = 
(; i) The vector v = (;) is an eigenvector of M with eigenvalue 2 because 


L(v) = Mv = ¢ |) a) - (3) = 2v, 


) is not an eigenvector of L because 
1 1 1 3 1 
elie 5) (=|) ea) 


Example 6.2.3 Consider the vector space of 7-bar LCD characters, V = D(Z2) and the linear 
transformation T on V whose action is to flip any character upside-down. For example, 


The vector w = (; 


for any scalar 4. 


ope ie A > 
(E : ee ee ee 
you aye) 


We see that any nonzero character x with up-down symmetry is an eigenvector of T with eigenvalue 
1 because T(x) = x = Ix. 


In order to find the eigenvalues and eigenvectors of a transformation L : V — V, we need to find 
A so that (L — AJ)(v) = 0 has nontrivial solutions v; equivalently, we can consider the equivalent 
matrix equation (M — A/)v = 0. If we find any eigenvector v with eigenvalue ., then any (nonzero) 
scalar multiple of v is also an eigenvector with eigenvalue \ (see Exercise 19). Also, if we add any 
two eigenvectors v and w with eigenvalue 4, then their (nonzero) sum v + w is an eigenvector with 
eigenvalue (see Exercise 20). This leads to the idea of an eigenspace of L (and of M) for any 
eigenvalue . 

It is important to note that eigenvalues are elements of the scalar set over which scalar-vector 
multiplication is defined. In this text, when the scalar field is IR, we consider only eigenvalues \ € R. 
Here, we define the eigenspace corresponding to an eigenvalue, ., to be the vector space of all vectors 
whose eigenvalue is \. 


Definition 6.2.4 


Let V be a vector space and L: V —> V a linear transformation. Let M be the matrix of the 
transformation L with respect to basis B of V. Suppose that \ is an eigenvalue of L (and hence, 
also of M). Then 

Ey ={ve V|Lv= dv} = Null(L — AI) 


is called the eigenspace of L (and of M) corresponding to eigenvalue 4. 
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Exercise 33 asks the reader to verify that every eigenspace of a linear transformation L is, indeed, 
a subspace of V. 

We conclude with a brief remark connecting the nullspace of a transformation to its eigenspaces. 
Suppose v is an eigenvector of the linear transformation L with eigenvalue A = 0. Because L(v) = 
Av = Ov = 0, v € Null(L). However, there is one vector, namely w = 0 € Null(L) that is not an 
eigenvector of L. We want to be careful to note that eigenspaces contain the zero vector, but the zero 
vector is not an eigenvector. 


6.2.2 Computing Eigenvalues and Finding Eigenvectors 


In this section, we outline a procedure, called the Factor Algorithm, for determining the eigenvectors 
and eigenvalues of a matrix. We will give examples showing how to implement this procedure. Finally, 
we will discuss eigenvalues of diagonal matrices and eigenvalues of the transpose of a matrix. We 
begin with the algorithm. 


> Factor Algorithm for finding Eigenvectors and Eigenvalues Given an n x n square matrix M, the 
algorithm for finding eigenvectors and eigenvalues is as follows. 


1. Choose a basis 6 = {w1, w2,..., Wm} for R”, and select one basis vector wz € B. 

2. Compute the elements of the set S = {we, Mwe, M?we,---, M* wy} where k is the first index 
for which the set is linearly dependent. 

3. Find a linear dependence relation for the elements of S. This linear dependence relation creates 
a matrix equation p(M)w, = 0 for some polynomial p(M) of degree k. 

4. Factor the polynomial p(/) to find values of \ € R and v satisfying (M — AJ)v = 0. Add any 
v to the set of eigenvectors and the corresponding \ to the set of eigenvalues. 

5. Repeat, from Step 2, using any basis element we € 6 that has not yet been used and is not in 
the span of the set of eigenvectors. If all basis vectors have been considered, then the algorithm 
terminates. 


This algorithm is based on a method by W. A. McWorter and L. F. Meyers.! It differs from the 
method that many first courses in linear algebra use. We begin with this approach because we find 
other calculations less illuminating for our purposes and the method we present in this section produces 
eigenvectors at the same time as their eigenvalues. For the more standard approach, we refer the reader 
to Subsection 6.2.3. 


We start with a small example of the Factor Algorithm. 


Example 6.2.5 Let us consider the matrix M defined by 


5 -3 
v=(23) 
We seek nonzero v € R? and \ € R so that (M — \J)v = 0. This means that v € Null(M — AZ). Our 


plan is to write v as a linear combination of a set of linearly dependent vectors. 
First, we choose the standard basis B for R? and begin the Factor Algorithm with the first basis 


1 
vector w = ( 


0) We now create the set S. Notice that 


' William A. McWorter, Jr. and Leroy F. Meyers, “Computing Eigenvalues and Eigenvectors without Determinants.” 
Mathematics Magazine, Vol. 71, No. 1, Taylor & Francis. Feb. 1998. 
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mo= (622) (0)= (9) 


Notice also that {w, Mw} is a linearly independent set. So, we continue by computing M7 w, 


#G)0-0 


Now, we have the linearly dependent set 


s-imaemaeoi-{().(3-Q)} 


Therefore, we can find a1, a2, a3, not all zero, that satisfy the linear dependence relation 


n()ra()raQ)=e 


In fact, one linear dependence relation is given by a; = 1, ag = —1, and a3 = —2. That is, 
M?w — Mw —2w = 0. 


Now, we seek a factorization of the left side, of the matrix equation above, in the form (M — AJ)(M + 
al)w for some scalars \ and a. We will explore two methods to find this factorization. First, we 
recognize that 


M?w — Mw —2w = (M7 — M —2)w 
= (M —-21)(M+ 1)w 
= (M —- 21)(Mw+w) 
or 
= (M+/)(Mw — 2w). 


From here, we have two eigenvalues, A; = 2 and A7 = —1, corresponding to the eigenvectors vj = 
Mw +w and v2 = Mw — 2u, respectively. 
Alternatively, we can find \ and a so that 


M?w — Mw —2w = (M — X1)(Mw + aw). 


We have 


set 


(M — X1)(Mw + aw) = M*w + (a— \)Mw — adw © Mw — Mw — 2 = 0. 
Matching up like terms and equating their coefficients, we have the system of nonlinear equations 


-l=a-)\, 
—2=-a\X. 
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Eliminating a, we have the equation in X: 
NW -rA-2=0. 


Again, we find two solutions: \ = —1, a = —2 and \ = 2, a = 1. Consider these two cases separately. 


1. When \ = —1 we have (M + /)(Mw — 2w) = 0, from which we see that if we assign vj = 
Mw — 2w, we have (M + J)v; = 0. In other words, Mv; = Iv, = v1. This means that v; (or any 
multiple of vj) is an eigenvector with eigenvalue 1. We now compute v1. 


3 
n= Mw —2w = (8). 


Ay =—l, E_-1 = Span{v;} = Span {(3) ; 


2. When A = 2 we have (M — 21)(Mw — w) = 0, from which we find 
6 
w= Mwtw= (2). 


A2 = 2, €2 = Span{v2} = Span {(:)| 3 


Thus, 


Thus, 


At this point, we have two eigenvectors v; and v2, and every vector in the starting basis G is in the span 
of the set of eigenvectors. So, the algorithm terminates. We have a full description of the eigenvalues 
and eigenspaces of M because the eigenvectors we have found form a basis for R*. In other words, 
every vector in R? can be written as a sum of eigenvectors from the eigenspaces, that is, R? = €; @ €o. 
(See Definition 2.5.27.) 


In general, if we are given ann x n matrix M, we can expect at most n eigenvalues for ann x n 
matrix. (See Theorem 6.2.16.) The Factor Algorithm procedure was simple in the example above 
because we obtained two distinct eigenvalues of the 2 x 2 matrix, M by considering a single set of 
linearly dependent vectors. Typically, we may need to repeat the steps in the algorithm for finding 
eigenvalues and eigenvectors. In addition, we may find fewer than n eigenvalues. These two scenarios 
are demonstrated in the next two examples. 


Example 6.2.6 Find the eigenvalues and eigenvectors of the linear transformation L : ?2(R)—> P2(R) 
defined by 
L(a + bx + cx’) = (a—b+c)+(atb—c)x+(-—a+b+c)x?. 


Consider the standard basis P = {1, x, x7} for P2(R). We have the matrix representation of L 
relative to this basis: 
1-1 1 
M=j;1 1-1 
-1 1 1 
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Notice that M has rank 3 and is invertible. We can see this by checking any of the many statements given 
in Theorem 5.2.20. So, we know that L is an isomorphism. Thus, Null(L) = {0} and Ran(L) = P2(R), 
and further, 0 is not an eigenvalue of L (or M). We would like to describe the eigenspaces in R? relative 
to M. 

We begin the Factor Algorithm by choosing the standard basis 6 for R” and initial basis vector 


1 
w= |0] €B. We next compute the linearly dependent set of vectors {w, Mw, M 2w,--+, ME w} 
0 
with smallest k. We have 
1 1 —1 —5 
S={w=|[0],Mw=[1|],Mw=[ 3 |], Mw=] 3 
0) -l1 -1 5 


The first three vectors in the set are linearly independent. We have the linear dependence relation 
Mew —3M?w + 6Mw — 4w =0 
for which we seek scalars \, a, 3 that give the factorization 
(M — \1)(M?w +aMw + Bw) =0. 
We have 


(M — X1)(M?w +aMw + Bw) = Mbwt+(a—AMw t+ (BG —adA)Mw — BAw 


set 


< Mew —3M?w + 6Mw — 4w. 


Equating coefficients of like terms, we have the nonlinear system of equations: 


—-3=a-—A 
6= 6B-ar 
—-4=—P2. 


Using substitution, we see this has solutions satisfying 

0=-3+6\-4=5 (A- DO? - 2A +4). 
This cubic equation has one real root, A; = 1 and two complex roots, A2,3 = 1 + J/3i. (We can arrive 
at the same result by observing that the linear dependence relation can be written as (M? — 3M? + 


6M — 41)w = 0 which factors to (M — I)(M? — 2M + 41)w = 0.) The real root is an eigenvalue of 
M and the corresponding eigenvector is 


v, = M?w+aMw+ Bw = Mw -2Mw+4u= |] 1 
1 
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We now have a set of eigenvectors consisting of the single vector vj. Step 5 of the algorithm 
requires us to choose, if possible, another vector from the basis 6 which is not in the span of the set 


0 
of eigenvectors. Suppose we choose w = | | |. Repeating Steps 2 through 5, we find the following 
0 
results which the reader is encouraged to check. 
0 -1 -1 3 
S={w=[1],Mw=|1],Mw=[-1],Mew=]-5 
3 3 


Mew —3M-w + 6Mw — 4w = 0. 


This is the same linear dependence relation and no new eigenvector information is revealed. We continue 


0 

by choosing the final basis vector, w = | 0 ], which also results in the same linear dependence relation. 
1 

The basis vectors are now exhausted and the algorithm terminates. 


We have only 
1 
Ay =1, &, = Span 1 
1 


So, any nonzero vector in €; is an eigenvector of the matrix M. What are the eigenvectors of the 
transformation L? We used the standard basis to obtain M from L, so v; = [pi(x)]g for some pi(x) € 
P2(R). That is 


[ri@lp= [1 
1 


This tells us that the eigenvector p(x) is 1(1) + 1(x) + (I)x* = 1+x +7. Thus, the eigenspace 
corresponding to eigenvalue A; = | is 


Ej = Span {1+x +27}. 
Indeed, 


L(pi(x)) = L(+ x +x”) 
S(=(4) $014 te= hy eersiane 
= l+xt+x? 
= 1- pi(x) 
= \pi(x). 


The next example illustrates how to find a full complement of eigenspaces when multiple passes 
through the algorithm may be needed to find all n potential eigenvalues. 
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Example 6.2.7 We wish to find the eigenvalues and eigenspaces of the matrix 


We proceed, as before, by choosing the standard basis B for R?, and selecting as an initial vector 


1 
w = | 0]. We have the linearly dependent set 
0 
1 3 1 
{w, Mw, Mw} = O7;,127, 70 
0 0 0 


This set has the linear dependence relation M*w—w=0O. This relation factors as 
(M — 1)(M + I)w = 0, leading to two different factorizations of the form (M+ AJ)v = 0. First, 


4 
(M — I)(Mw + w) = Oyields \; = 1 withv;) = Mw+ w= | 2 }.Second, (M+ /)(Mw — w) = 0 
0 
2 
yields A2 = —1 with vp = Mw — w= | 2 
0 
We now have a set of two linearly independent eigenvectors. We next choose a basis vector that is 
0 
not in the span of this set. The second basis vector | 1 ] is a linear combination of v; and v2, so we 
0 
0 
cannot select it and we need not consider it further. The third basis vector w = | 0 | is not in the span, 
1 


so we proceed with this choice. We have linearly dependent set 
0 2 0 
S=j)w=[0],Mw=[2],M’w=|0 
1 1 1 


and corresponding linear dependence relation M*w—w-=0O. This relation factors as 
2 
(M —1)(M + 1I)w = 0. First, (M — 1)(Mw + w) = 0 yields \1 = 1 with v3 = Mw+w = | 2 
2 


2 
Second, (M + I)(Mw — w) = 0 yields Az = —1 with v2 = Mw — w = | 2 J, an eigenvector that 
0 
we found previously. We now have a set of three linearly independent eigenvectors v1, v2, and v3 
with corresponding eigenvalues \ = 1, —1, 1. Eigenvalue A = | is associated with an eigenspace 
of dimension 2 and eigenvalue \ = —1 is associated with an eigenspace of dimension 1. We have 
exhausted the basis B so the algorithm terminates and our collection is complete. 
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2 1 
Ay =1, & = Span 1}],]1 
0 1 
1 
A2=—-1, &=Span 1 
0 


Why does the algorithm work? 
How do we know that we will find all eigenvalues (and corresponding eigenspaces) using this method? 
In any one iteration of our algorithm, starting with vector w, the eigenvectors produced will be linear 
combinations of the vectors {w, Mw,..., M kwh. Because we use the minimum & so that this set is 
linearly dependent, there is essentially (up to constant multiples) only one linear dependence relation, 
for which there is one overall factorization. Hence after this iteration through the algorithm, we have 
found all eigenvectors in Span{w, Mw, ..., M*w}. As the algorithm allows a search over a complete 
set of basis vectors, all possible eigenspaces will be found. 

Fortunately, the computation of eigenvalues is often simplified by the particular structure of the 
matrix. Consider the following results. 


Lemma 6.2.8 
Suppose the k‘ * column of n x n real matrix M is Bex, a scalar multiple of the k! * standard basis 
vector for IR”. Then ex is an eigenvector of M with eigenvalue (3. 


Proof. Suppose the j‘" column of n x n real matrix M is Mj, and M; = Gex. Let (ex) ; denote the 
jth entry of the vector e,. Then, we have 


Mex = Mi (ex)1 + Mo(ex)2 +--+ + Ma len) 
= M,(0) +... + Mg(1) +... + Mn(0) 
= Mx(@) 
= Bex. 


Therefore, ex is an eigenvalue of M with eigenvalue /3. 


200 2 
Example 6.2.9 Consider matrix M = | 03 1]. Because M; = | 0] = 2e1, e; is an eigenvector of 
003 0 
M with eigenvalue 2. A simple matrix-vector multiplication will verify this fact. 
200 1 2 0 2 
Me; =|031 O7=170)4+0/3])+0]7 1] =] 0] = 2e. 
003/ \0 0 0 0 


Notice also that e2 is an eigenvector of M (with eigenvalue 3) because M2 = 3e2. 
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Theorem 6.2.10 
The eigenvalues of an upper (or lower) triangular n x n matrix M are the diagonal entries, mx, 
fe = 1,999 oie 


Proof. Let M be the upper triangular matrix given by 


M1] M{2... Min 
O m2 mo 
M= 
0 O ...Mnan 
Then by Lemma 6.2.8, vm,,; = e1 iS an eigenvector of M with eigenvalue m ,. Next, let us assume 


m22 # m,, (for otherwise, we already have a vector with eigenvalue m2) and consider the vector 


m2 


my, — ™22 


Uma = 0 
0 
Then 
my (—m)2) + m2(m 11 — m2) —m\2M22 
my, — m2 my, — m2 
m22 m22 
Mv, = 0 = 0 = M7202. 
0 0 


Therefore, 722 is an eigenvalue of M corresponding to v2. In the same vein, for k < n, let us assume 
that mxx 4 mij for any i < k and consider the vector 


ay 


a2 


Uk = ’ 


where 
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az = 1 and a; = ———__ 2 QjmMij- 
Then the ith entry in Mv, is given by 


n 
(Mog) = So mio; 


j=l 
= mij + Mii41yQig1 +... + Mi(K—1)R—-1 + Mik 
k k 
Mii 
= -————___ pi) ajmig+ DY mia 


mj —m 
i oe re jae 


k 
_ 3 =a jmijmij + Mij;Aj (Mii — MkK) 


mij — mk 
jal ii k 
k 
Mkk 
i dE mizay. 
ii kk re 


So, Mux = mggug. Therefore, vg is an eigenvector of M with eigenvalue my x. 


It may seem that v; in the proof above magically appears. In order to produce the vector vz above, 
we first did some scratch work. We considered the matrix equation M — m22I = 0 for the matrix 


abc 
M=|{def 
ghi 


We then used matrix reduction to find a solution. We then repeated this fora 4 x 4 matrix M. We saw 
a pattern and tried it on ann x n matrix with diagonal entries mj;;, 1 < i <n. The important message 
here is that a proof does not typically get pushed through without behind-the-scenes scratch work. We 
encourage the reader to do these steps before starting other proofs of this nature. 

Another result about eigenvalues that will become important in Section 6.4 is the following theorem. 


Theorem 6.2.11 
Let M be ann x n matrix. Then, ) is an eigenvalue of M if and only if \ is an eigenvalue of M". 


Proof. Let M be ann x n matrix. 
(=>) Suppose A is an eigenvalue of M. Then M — AI, is not invertible. Then, by Theorem 5.2.20, 
we know that (M — X/)! is not invertible. But, 
M' — XI = (M — XI)". (See Exercise 36) 


Therefore, M! — XJ is not invertible. That is, \ is an eigenvalue for M'. 
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(<) To show that if \ is an eigenvalue of M' then it is also an eigenvalue of M, we use the same 
argument with M' in place of M and recognize that (M')' = M. 


6.2.3 Using Determinants to Find Eigenvalues 


A fairly standard approach to finding eigenvalues involves the use of determinants. Here, we describe 
the connections between determinants and eigenvalues. First, let us recall the Heat Diffusion exploration 
in Section 4.3. In the exploration, we found that, for certain heat states, applying the diffusion operator 
resulted in a change in amplitude only. That is, these heat states satisfy the eigenvector equation 


Ev=., (6.3) 


where v is an eigenvector with eigenvalue . We also found that these heat state vectors satisfy the 
matrix equation 
(E — Al)v = 0. (6.4) 


This is a homogeneous equation, and hence has a solution. This means that either there is a unique 
solution (only the trivial solution v = 0) or infinitely many solutions. If we begin with the zero heat 
state (all temperatures are the same everywhere along the rod) then the diffusion is trivial because 
nothing happens. It would be nice to find a nonzero vector satisfying the matrix Equation (6.4) because 
it gets us closer to the desirable situation of having a basis. 

Notice that det(M — AJ) is a polynomial in \. This polynomial will be very useful in our discussion 
of eigenvalues and thus it deserves a name. 


Definition 6.2.12 


The function 
f(x) = det(M — xI) 


is called the characteristic polynomial of L (and of M). 
The corresponding equation, used to find the eigenvalues, 


det(M — \I) = 0 


is called the characteristic equation. 


Theorem 4.5.20 tells us that Equation 6.3 has a nonzero solution as long as 4 is a solution to the 
characteristic equation, that is, 
det(E — AJ) = 0. 


This observation gives us an equivalent characterization for eigenvalues that we present in the next 
theorem. 
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Theorem 6.2.13 
Let M be ann x n matrix. Then 4 is an eigenvalue of M for some eigenvector v € R” if and 
only if \ is a solution to 

det(M — Al) = 0. 


Proof. Let M be andn x n matrix. 
(=) Suppose A is an eigenvalue of M. Then Mv = Av for some nonzero vector v and (M — AJ)v = 
0 has nontrivial solutions. Since at least one solution exists, this system of equations is consistent and 
we must have det(M — AJ) = 0. That is, A is a zero of the characteristic polynomial det(M — XJ). 
(<=) Now suppose that \ is a zero of the characteristic polynomial det(M — AJ). Then det(L — 
AI) = 0 and (M — XAI)v = 0 has some nontrivial solution v. That is, there exists nonzero v such that 
Mv = \v and 1 is an eigenvalue of L. 


We now use Theorem 6.2.13 as a tool and revisit the factor algorithm examples in the context of 
determinants. 


Example 6.2.14 Consider the linear transformation of Example 6.2.2 L : R? > R? defined by Lx = 


Mx where M = € ie i} We can find the eigenvalues of M by finding the zeros of the characteristic 


polynomial det(M — XJ) as follows. First we compute the matrix M — XI. 
11 0 
m—a=(3 41) (or) 
_fi-A 1 
~\ 30 -1-A)° 


1-A 1 
3. -1-A 


Therefore, the characteristic polynomial is 


det(M — XI) = [=a -ac1- 9-3, 


Finally, we find the zeros of the (quadratic) characteristic polynomial. That is, we solve 
(1—A)(-1-— A) -3 =0. 


And, we find that 4; = —2 and A2 = 2 are solutions. That is, the eigenvalues of M are \y = —2 and 
A2 = 2. 

Theorem 6.2.13 gives us a method to find the eigenvectors of M. That is, we now solve the matrix 
equations (M — AJI)v = 0 for Ay = —2 and A2 = 2. 

For Aj = —2, we have 
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(M — \{I)v = 0 


Using matrix reduction, we see that the solutions to (M — \,J)v = 0 are of the form v = ee) 


Therefore, the eigenspace corresponding to Ay = —2 is 


co-{()ea}-oml(3)} 


Similarly, we find that the eigenspace for A2 = 2 is 


co={() ieee] sm (()} 


It is critical to note that eigenvalues are elements of the scalar set over which scalar-vector 
multiplication is defined. Consider, for example, a possible characteristic polynomial 


—(A — 4)? + 1). 


If the scalar set for the vector space in which we are working is R, then there is only one zero, namely, 
A = 4, of the characteristic polynomial. The complex zeros, \ = +i, are not considered in the context 
of real-valued vector spaces. Finally, consider the characteristic polynomial —(A — 4)(A — L(A — 1). 
There are two real zeros \ = 1, 4, but A = | has a multiplicity of 2 and therefore gets counted twice. 
This means that we will say that there are three real zeros and will be careful to indicate when these 
zeros are distinct (no repetition). 

We now give one more example where we find the eigenvalues of a matrix. Again, the procedure we 
will use begins by finding the zeros of the characteristic polynomial. Then, eigenvectors are determined 
as solutions to Equation 6.4 for the given eigenvalue A. 


Example 6.2.15 Let us find the eigenvalues and eigenvectors of the matrix 
253 
(53) 


First, we find the zeros of the characteristic polynomial: 


det(A — AZ) =0 

53 10 
(ox) -aoi)|=2 
ae 3] -0 


347 
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(5 — A)(—4 — A) + 18 =0 
—rA-2=0 


(A+ IA — 2) =0 


A=-1,rA=2. 
So, now we have two eigenvalues 4; = —1 and Az = 2. (The subscripts on the \’s have nothing to do 
with the actual eigenvalue. We are just numbering them.) Using these eigenvalues and Equation 6.4, 
we can find the corresponding eigenvectors. Let’s start with 4; = —1. We want to find v so that 
(A —(-lJ)v=0. 
We can set this up as a system of equations: 
(A+ 1)v =0 
6 —3 x\ (0 
6 —3 y} \O0 
6x + —3y = 0 
6x + —3y = 0. 


We can see that this system has infinitely many solutions (that’s what we expected) and the solution 


space is a= {(;%) ver} =Span{(,). 


Using the same process, we can find the eigenvectors corresponding to A7 = 2: 


(A —21)v =0 


3 -3 x\ (0 
6 —6 y) \O}° 
So, the solution space to this system is 


a=(() 
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We can verify that every nonzero vector in €2 is an eigenvector of A with eigenvalue 2 = 2. Notice, 


s=(S2)(2)=() == 


The reader can verify that any nonzero vector in €; is an eigenvector of A with eigenvalue \ = —1. 


6.2.4 Eigenbases 


Now that we can find any eigenvalues and eigenvectors of a linear transformation, we would like to 
determine if a basis of eigenvectors exists. Computations, such those from our heat state evolution 
application, would be much simpler when using a basis of eigenvectors. Since a basis consists of 
linearly independent vectors, the next theorem is a giant step in the right direction. 


Theorem 6.2.16 


Suppose V is a vector space. Let Aj, A2,..., Ax be distinct eigenvalues of the linear 
transformation L : V — V with a set of corresponding eigenvectors vj, v2,..., vg. Then, 
{v1, V2,..., Ux} is linearly independent. 


Proof. (By induction on k.) Let L : V > V bea linear transformation. Suppose k = 1, and v, is an 
eigenvector of L. Then {v;} is linearly independent because v1 4 0. Now, fork > 1, let Ay, Az, ..., Ax 
be distinct eigenvalues of L with eigenvectors v1, v2,..., vg. Assume that {v,, v2,..., vg} is linearly 
independent for k > | and for distinct eigenvalues \;, A2,..., Ax. We show that {v 1, v2, ..., Ue, Vet} 
is linearly independent when Ax41 A A;, 1S j<k. Since vg41 € Null(L — Ax41/), we need only show 
that no nonzero vector w € Span{v1, v2,..., vg} is in Null(Z — Agay J). Let w = ayvy + a2u2 + 
... + AUK. 


(L = Apgi Dw = (ZL = Agi l)(arvuy + agu2 +... + agg) 
= (Ar — Agtiarur + (A2 — Agr azv2 +... + Ag = Akt OKUE. 


Since the eigenvalues are distinct and the vectors v1, v2, ..., vx are linearly independent, if any a, 4 0 
then w ¢ Null(Z — Ax41/). Thus, no nonzero w is in Null(Z — Ax41/) and so we conclude that 
e+ ¢ Span{vy, v2,..., ug} and {v1, v2,..., Ve, Ve+i} is linearly independent. 


Corollary 6.2.17 
Let V be an n-dimensional vector space and let L : V — V be linear. If L has n distinct 
eigenvalues A}, A2,..., A, with corresponding eigenvectors v1, v2, ..., Un, then {vj, v2,..., Un} 
is a basis for V. 


The proof is the subject of Exercise 32. 
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Up to this point, we have considered only real eigenvalues. If we consider complex eigenvalues, 
we change the fourth step of the eigenvalue algorithm by looking for all roots of the polynomial p. 
The Fundamental Theorem of Algebra” guarantees k (possibly repeated) roots p. Each root provides 
an eigenvalue. Though we will not prove it, we need the following result that states that the algorithm 
will produce n eigenvalues (some of which may be repeated or complex). 


Theorem 6.2.18 
If M is ann x n matrix, then there are n, not necessarily distinct, eigenvalues of M. 


When we are searching for eigenvectors and eigenvalues of ann x n matrix M, we are considering 
the linear transformation L : R” — R” defined by L(v) = Mv whose eigenvectors are vectors in 
the domain space that are scaled (by the eigenvalue) when we apply the L to them. That is, v is an 
eigenvector with corresponding eigenvalue \ € R if v € R” and L(v) = Av. An eigenbasis is just a 
basis for IR” made up of eigenvectors. 

Similarly when searching for eigenvectors and eigenvalues of a general linear transformation L : 
V — V, eigenvectors of L are domain vectors which are scaled by a scalar under the action of L. An 
eigenbasis for V is a basis made up of eigenvectors of L. 


Definition 6.2.19 


Let V be a vector space and L : V — V linear. A basis for V consisting of eigenvectors of L is 
called an eigenbasis of L for V. 


Recall Example 6.2.5, where we found eigenvalues and eigenvectors of matrix A = € > If 


we create a set consisting of basis elements for each of the eigenspaces €, and £€2, we get the set 


B= {(3) : @) | which is a basis for R2. 


In Example 6.2.6, we found that the transformation L had only one eigenvalue. As L is a 
transformation on a space of dimension 3, Corollary 6.2.17 does not apply, and we do not yet know if 
we can obtain an eigenbasis. 


Example 6.2.20 Consider the matrix 
200 
A= |031 
003 


We want to know if there is an eigenbasis for A for R*. We begin by finding the eigenvectors and 
eigenvalues. That is, we want to know for which nonzero vectors v and scalars \ does Av = Av. We 
know that the eigenvalues of A are \ = 2, 3 by Theorem 6.2.10. We also know, by Lemma 6.2.8, that 
e; is an eigenvector with eigenvalue 2 and e2 is an eigenvector with eigenvalue 3. To obtain the final 
eigenvector, we choose a vector y so that {e;, e2, y} is linearly independent and apply the algorithm 
on page 335. Choosing y = e3 we find linearly dependent set 


? The Fundamental Theorem of Algebra states that any nth degree polynomial with real coefficients has n (not necessarily 
distinct) zeros. These zeros might be complex or real. 
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0 0 0 
{y, Ay, A*y}=} [0], [1], [6 
1 3 9 


The linear dependence relation yields the equation and its factorization 


0 = A*y — 6Ay + Dy = (A — 3/)(Ay — 3y). 


0 
We have the eigenvalue 3 and eigenvector Ay — 3y = | 1}, which is the same eigenvector we 
0 
previously found. Thus, the two eigenspaces are 
1 0 
€; = Span; |0O]?, & =Span 1 
0 0 


We cannot form an eigenbasis of A for R* because we can collect at most two linearly independent 
eigenvectors, one from €; and one from €2. 


6.2.5 Diagonalizable Transformations 


Our explorations in heat diffusion have shown us that if 6 = {v1, v2, ..., U,} is an eigenbasis for R” 
corresponding to the diffusion matrix E then we can write any initial heat state vector v € R” as 


V=Q1vV, +Q2V2 +... + AnUn. 


Suppose these eigenvectors have eigenvalues 1, A2, ..., An, respectively. Then with this decomposition 
into eigenvectors, we can find the heat state at any later time (say k time steps later) by multiplying 
the initial heat state by E*. This became an easy computation with the above decomposition because 
it gives us, using the linearity of matrix multiplication, 


EXyy= EX (a,v1 + agua +...+ QnvUn) = ay du; + arAu2 +...+ in A Uy. (6.5) 


We can then apply our knowledge of limits from Calculus here to find the long-term behavior. That is, 
the long-term behavior is 
lim aru, + ar Mfuz +...+ On r* Un. 
k->0o 


We see that this limit really depends on the size of the eigenvalues. But we have also seen that 
changing the representation basis to an eigenbasis was convenient for heat state calculations. Let’s 
remind ourselves how we went about that. First, if we want all computations in the eigenbasis, we 
have to recalculate the diffusion transformation matrix as well. In other words, we want a matrix 
transformation that does the same thing that L(v) = Ev does, but the new matrix is created using the 
eigenbasis. Specifically, we want the matrix representation for the linear transformation that takes a 
coordinate vector [v]e (where € = {v1, v2, ..., Un} is the eigenbasis) and maps it to [Ev]¢. Call this 
matrix E’. What we are saying is that we want E’[v]e = [Ev]e. As always, the columns of this matrix 
are the vectors that are the result of applying the transformation to the current basis elements (in €). 
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Thus, the columns of E’ are [Evi ]e, [Evole,...,[Eunle. But 
1 
0 
[Evie =Vivile = Ailvile =A] . 
0 
0 
1 
[Evale = Davale = Arlv2le = 2] . 
0 
0 
0 
[Eunle = [AnUnle = Anlvnle = An : 
1 
So, we have 
A; 0... 0 
0 Xr. 
E'= 
0... 0 An 


Knowing that a change of basis is a linear transformation (actually, an isomorphism), we can find its 
matrix representation (usually known as a change of basis matrix). Let’s call this matrix Q and see 
how this works. We know that Q[v]¢ = [v]s. This means that if we are given a coordinate vector with 
respect to the basis €, this transformation will output a coordinate vector with respect to the standard 
basis S. Recall that to get the coordinate vector in the new basis, we solve for the coefficients in 


V= QV, +A02U2 +... + AnUp. 


Then 


Qn 


One way to solve this is to set up the matrix equation 
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2 
& 
2 
i) 
e 
= 
ll 
2 


Qn 


This is the transformation written in matrix form. The matrix representation that takes a coordinate 
vector with respect to the basis € to a coordinate vector with respect to the standard basis is 


| | | 
Q=] v1 v2... Up 


So the matrix representation for the transformation that changes from the eigenbasis € to the standard 
basis is given by Q. The columns of @Q are the eigenvectors of E written with respect to standard 
coordinates. Because € is a basis of IR”, we can use Theorem 5.2.20 to see that oO exists. We use all 
of this information to rewrite 

E'[ule = [v]e. 


That is, [uje = Ou and [v]e = Q-'v for some u and v in the standard basis. So, we have 
Q-'(u(t + At)) = Q7'(Eu@)) = E'O7'u(t), 
u(t + At) = Eu(t) = QE'O7'u(t). 
It is straightforward to show that for time step k: 


u(t + kAt) = E*u() = O(E')* O7'u(s), 


AYO 2x5. 0 
0 
utt+kaAn=Etu(n=O]: --, : | ot, 
eee ae 
MD ex 
0 x 
Q'uet+kAn=]: +, : | Ota), 
Oi OE 
MO ee © 
oe 
[u(t +kAtle =] : "| lu@le. 


0... 0 »¥* 
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We see that when vectors are represented as coordinate vectors with respect to an eigenbasis, the matrix 
representation of the transformation is diagonal. 
Of course, all of this is dependent on the existence of an eigenbasis for R”. Exercise 30 gives the 
necessary tools to show that we indeed have an eigenbasis for the diffusion transformation. 
Following the same procedure in a general setting, let us see what this means in any context. That is, 
we want to know when we can actually decompose a matrix M into a matrix product QD Q~! where Q 
is invertible and D is diagonal. From above we see that to form the columns of Q we use the eigenvectors 


of A. This means that as long as we can find an eigenbasis {v1, v2,..., Un}, then Q = | v1 U2... Uy 


| | | 
The invertibility of Q follows directly from the fact that Ran(Q) = Span{v1, v2, ..., Un} = R” and 


Theorem 5.2.20. 
Definition 6.2.21 


Given a vector space V, alinear transformation L : V > V iscalledadiagonalizable transformation 
if there is an ordered basis B for V such that [L]y is a diagonal matrix. 


As with other descriptors for transformations, we can also talk about diagonalizable matrices. We 
define that here. 


Definition 6.2.22 


Given ann x n matrix M, we say that M is diagonalizable if there exist an invertible matrix Q 
and diagonal matrix D so that A= QDQ7!. 


Before we look at some examples, we have the tools to make three very important statements 
about diagonalizability of linear transformations. The first theorem provides an existence test for 
diagonalizability, only requiring that one compute and test the set of eigenvalues. 


Theorem 6.2.23 
Let V be a vector space and let L: V > V be a linear transformation. If L has n distinct 
eigenvalues, then L is diagonalizable. 


The proof follows from Corollary 6.2.17 and the discussion above. See Exercise 22. 
The second theorem provides a somewhat less-desirable test for diagonalizability. It tells us that if 
an eigenbasis exists for V corresponding to the transformation L then L is diagonalizable. 


Theorem 6.2.24 
Let V be a vector space and let L : V > V bea linear transformation. Then L is diagonalizable 
if and only if L has n linearly independent eigenvectors. 


The proof of this theorem is the subject of Exercise 35. 
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We now consider the connection between the eigenvalues of a matrix and its invertibility. 


Theorem 6.2.25 
Let M be ann x n matrix. Then M is invertible if and only if 0 is not an eigenvalue of M. 


Proof. Let M be ann x n matrix. We will show that if 0 is an eigenvalue of M, then M is not invertible. 
We will then show that if M is not invertible, then 0 is an eigenvalue. 

First, assume 0 is an eigenvalue of M. Then there is a nonzero vector v so that Mv = Ov. That 
is, v is a nonzero solution to the matrix equation Mx = 0. Therefore, by Theorem 5.2.20, M is not 
invertible. 

Now assume & is not invertible. Then, again, by Theorem 5.2.20, there is a nonzero solution, v, to 
the matrix equation Mx = 0. That is, Mv = 0 = 0- v. That is, v is an eigenvector with eigenvalue 0. 


Theorem 6.2.25 is an additional equivalent statement that can be added to the Invertible Matrix 
Theorem (Theorem 5.2.20). 

If M is ann x n matrix with n distinct nonzero eigenvalues, 41, 2, ..., An, then we have a tool to 
compute the inverse of M. In fact, because M is diagonalizable, we can write 


M=QDQ"', 
where D is the diagonal matrix 
A; 0... 0 
0 rA2 0 
D= . 
0 O An 
Thus, M~'! = QD~!Q7!, where 
a a 
1 
oy 0 
aS : 
C0 0 


We leave it to the reader to check that the inverse is, indeed, the matrix above. Exercise 12 is a result 
of this discussion. We now present a couple of examples about the diagonalizability and invertibility 
of M. 


Example 6.2.26 Let M be the matrix given by 
101 


213 
101 


M 
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We want to determine whether M is diagonalizable. If we can find 3 linearly independent eigenvectors 
then M is diagonalizable. By Lemma 6.2.8, vj = (0 1 0) is an eigenvalue of M with eigenvalue \; = 1. 


To find any remaining eigenvalues and eigenvectors, we apply the Factor Algorithm. 
Consider vector y = (1 0 0) (because {v1, y} is linearly independent). We have linearly dependent 


1 1 2 4 
{y, My, M?y, M?y}=4[0],[2].[ 7]. [17 
0 1 2 4 
The linear dependence relation is M* y — 3M? y + 2My = 0 with factorizations 


(M — 01)(M’y — 3My + 2y) = O and (M — 21)(M’y — My). 


Thus, we have the new eigenvalues and eigenvectors: 


2 =0, v2 = M*y—3My+2y=|[ 1 |, 


1 
A3 = 2, v3 = M*y — My = 5 
1 


Theorem 6.2.25 tells us that M is not invertible. However, we have the eigenbasis {v1, v2, v3} of n = 3 
vectors, so M is diagonalizable as M = QD Q-! where 


1 O41 000 
Q=|1 15] and D={010 
= 002 


In Example 6.2.20, we found only two eigenvalues each of which had eigenspace of dimension 1. 
This means that we cannot form an eigenbasis for R*. Thus, M is not diagonalizable. This might lead 
us to think that we can just count the eigenvalues instead of eigenvectors. Let’s see an example where 
this is not the case. 


Example 6.2.27 Let M be the matrix given by 


310 
M= {130 
222 


We can find the eigenvalues and eigenvectors of M to determine if M is diagonalizable. Using the 
1 

Factor Algorithm, begin with vector w = | O |. We find the linearly dependent set 
0 
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1 3 10 
{w, Mw, Mw} — O},;/1],] 6 : 
0 2 12 


which leads to the linear dependence relation M*w—6Mw+8w=0 and factorizations 
(M — 21)(Mw — 2w) = Oand (M — 4/)(Mw + w). Thus, we have some eigenvalues and associated 
eigenvectors: 


1 
Ay =2, v3 =Mw-2w=]1 
2 
4 
2=4, w= Mwtw=]fl 
2, 


Next, we seek a third eigenvector, linearly independent relative to the two we know. Notice that e3 is 
an eigenvector of M with eigenvalue 2. Because we have three linearly independent eigenvectors, M 
is diagonalizable as M = QDQ7! where 


10-1 400 
Q=j10 1 and D=|020], 
21 0 002 


even though there are only two distinct eigenvalues. Because 0 is not an eigenvalue, Theorem 6.2.25 
tells us that M~! exists. In fact, 


10-1\ /200\ /io-1\ 
M'=opD"'o'=|101 o50}]]{101 
21 0/ \o05/ \21 0 


We wrap up this section with a discussion about the appropriate language. 


xx Watch Your Language! Let V be a vector space with ordered basis 8. Consider a matrix M and a linear 
transformation L : V — V.Indiscussion of eigenvectors, eigenvalues, and diagonalizability, it is important 
to include the transformation or matrix to which they correspond. We also need to take care in using the 
terminology only where appropriate. We present the appropriate use of these terms here. 


/ The eigenvalues of M are A1, Az, ..., Ax € R. 

/ The eigenvalues of L are Ay, Az, ..., Ax € R. 

¥ The eigenvectors corresponding to M are vectors in V. 

/ The eigenvectors of L are the vectors v1, v2,..., Uz. 

Y Lis diagonalizable because the matrix representation [L]g is a diagonalizable matrix. 


It is inappropriate to say the following things: 


X The eigenvalues are \;, A7, ..., Ax € R. 
X The eigenvectors are vj, v2,..., Ug. 
X Lis the diagonalizable matrix [L],3. 
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‘= Path to New Applications 

Nonlinear optimization and optimal design problems use approximation techniques that require 
the tools for finding eigenvalues and eigenvectors of Hessian matrices. See Section 8.3.3 to learn 
how to connect optimization and optimal design to the tools of linear algebra. 


6.2.6 Exercises 


For each of the following matrices, use the Factor Algorithm to find eigenvalues and their corresponding 
eigenspaces. 


2-3 100 

Li (-) 4.M—- [121 

542 

11 

2. m= (15) 3-42 
ih re 5 ye ar 

3. M=|0-2 0 00 
03-1 1000 
1200 
iM 5 49g 
‘cea 


For each of the following matrices, find the characteristic polynomial and then find eigenvalues and 
their corresponding eigenspaces. 


2-3 100 
eas =) 10. M=(121 
542 
‘4 
sw=(1,5) 3-42 
Py ek oe see 
9. M=|0-2 0 oo 
O35 H1 1000 
1200 
Bi 5430 
1111 


Determine which of the following matrices are diagonalizable. Whenever it is, write out the diagonalization 
of M. 


2-3 5 10 -3 
a M=(5%) 15. M=[0-2 0 


03 -1 
1 1 
4. m=(1,5) 
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100 1000 
16. M=]121 1200 

542 mime ee 

ae ti id 
i we lo—32 

001 


Prove or disprove the following statements: 


19. If v is an eigenvector of a transformation L, then so is av for any nonzero scalar a. 

20. If v and w are eigenvectors of a transformation L having the same eigenvalue 4, then v + w is 
also an eigenvector with eigenvalue 4. 

21. If vj and v2 are eigenvectors of a transformation L having the same eigenvalue ., then any linear 
combination of vj and v2 is also an eigenvalue with the same eigenvalue. 

22. If v; is an eigenvector of L with eigenvalue A; and v2 is an eigenvector of L with eigenvalue A2 
then v; + v2 is also an eigenvector of L. 

23. If v is an eigenvector of M, then v' is an eigenvector of M'. 

24. If is an eigenvalue of invertible matrix M, then t is an eigenvalue of M~!. 

25. If \ is an eigenvalue of M then )? is an eigenvalue of M?. 


Additional Exercises. 


26. Let a matrix M have eigenspaces €1, €2,...E,% with bases 3), G2,..., 0. Why does Theorem 
6.2.16 tell us that G) U G2 U...U @ is a linearly independent set? That is, the union of bases of 
eigenspaces of a matrix forms a linearly independent set. 

27. Show that Q0DQ~! = A in Example 6.2.26. 

28. Prove that the eigenvalues of a diagonal matrix are the diagonal entries. 

29. Verify that the eigenvalues of the triangular matrix below are its diagonal entries. (This fact is true 
for upper and lower triangular matrices of any dimension.) 


a00 
bc 0 
def 


Hint: if \ is an eigenvalue of a matrix A, what do you know about the matrix A — AJ? 
30. Consider the heat diffusion operator E : R’” — R” with standard basis matrix representation 


1-256 6 0 0 
6 1-26 6 O... 


0 6 1-266 


where 0 < 6 < i Show that the k’” eigenvector vz (1 < k < m) is given by 


(s mk Ink 3k _ (m— 1)rtk ney 
UE = | sin 


, sin , sin ,..., Sin , sin 
m+ 1 m+ 1 m+ 1 m+ 1 m+ 1 
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and provide the k'” eigenvalue. Discuss the relative size of the eigenvalues. Is the matrix 
diagonalizable? 

31. Complete Example 6.2.3 by finding all eigenvalues and bases for each eigenspace. Is the 
transformation diagonalizable? 

32. Prove Corollary 6.2.17. 

33. Prove that every eigenspace €) of linear transformation L : V — V is a subspace of V. 

34. Prove Theorem 6.2.23. 

35. Prove Theorem 6.2.24. 

36. To assist with the proof of Theorem 6.2.11, show that for any square matrix M and diagonal matrix 
D that (M+ D)' = M'+D. 

37. Consider an invertible diagonalizable transformation T = QDQ™!, where D is diagonal. Show 
that the inverse transformation T~! = QD~!Q7!. 

38. Let F(R) ={f:R— R| f is differentiable and continuous} and L : F(R) — F(R) be defined 
by L(f (x)) = f’(x). What are the eigenvectors (eigenfunctions) of this transformation? 

39, Ann x n matrix A is said to be similar to ann x n matrix B if there exists an invertible matrix Q 
so that A = QBQ7'. Prove that similarity satisfies the following properties* . Hence, a matrix is 
diagonalizable if and only if it is similar to a diagonal matrix. 


(a) A is similar to A. 
(b) If A is similar to B then B is similar to A. 
(c) If A is similar to B and B is similar to C, then A is similar to C. 


6.3 Explorations: Long-Term Behavior and Diffusion Welding Process 
Termination Criterion 


A manufacturing company uses a diffusion welding process to adjoin several smaller rods into a single 
longer rod. These thin Beryllium-Copper alloy parts are used in structures with strict weight limitations 
but also require strength typical of more common steel alloys. Examples include satellites, spacecraft, 
non-ferromagnetic tools, miscellaneous small parts, aircraft, and racing bikes. 

The diffusion welding process leaves the final rod unevenly heated with hot spots around the weld 
joints. The rod ends are thermally attached to heat sinks which allow the rod to cool after the bonding 
process as the heat is drawn out the ends. An example heat state of a rod taken immediately after weld 
completion is shown in Figure 6.2. 

The data points represent the temperature at 100 points along a rod assembly. In this example, there 
are four weld joints as indicated by the four hot spots (or spikes). We begin this section by considering 
the heat state several time steps in the future. We then guide the reader through an exploration in 
determining the criterion for save removal of a welded rod. 


6.3.1 Long-Term Behavior in Dynamical Systems 
In the example of diffusion welding, we are interested in knowing the heat state of a rod at any time after 
the welds are completed. We will explore this application in the context of eigenvalues and eigenvectors 


of the diffusion matrix. Recall, we have the diffusion matrix E so that 


u(kAt) = E*u(0). 


3 These properties define an equivalence relation on the set of n x n matrices. 
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Rod Temperature at Time of Weld Completion 


v 
S 


0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 
Position Along Rod (m) 


Fig.6.2 An example heat state right after a weld is finished. 


We know that if v1, v2, ..., Um are eigenvectors of E with corresponding eigenvalues 1, A2,..-, Am 
and if 
u(O) = aqyvy + agv2 +... + AnUm 


then 
u(kAt) = ay Muy + ar duz eee i iy Ne Dis 


The remainder of this section is an exploration to discover what these eigenvectors look like, what the 
corresponding eigenvalues are, and how to describe the behavior of the diffusion based on these. The 
reader will use the following MATLAB or OCTAVE code found at IMAGEMath.org. 


HeatEqnClassDemos.m 
EigenStuffPlot.m 
DiffuseLinearCombination.m 
HeatStateLibrary.m 
HeatDiffusion.m 
EvolutionMatrix.m 


6.3.2 Using MATLAB/OCTAVE to Calculate Eigenvalues and Eigenvectors 


The following exercises include both MATLAB/OCTAVE tasks and discussion points. 


1. Watch the graphical demonstration of heat state diffusion by typing the following command: 


HeatEgqnClassDemos (1); 


What characteristics of heat flow do you observe in this demonstration? 
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2. Find the eigenvectors and eigenvalues for the heat diffusion transformation. Begin by running the 
function that creates the diffusion matrix using the commands below: 


Is E as you expected? 
What is the value of 6 used by this code? Use the eig command to find the eigenvalues and 
eigenvectors as follows: 


[V,D]=eig (1 


GC 


G 
Now, we want to verify that V is the matrix whose columns are eigenvectors and D is the matrix whose 
diagonal entries are the eigenvalues. 
3. Show that the matrix D is actually diagonal by typing 
D 


4. Now verify that the first column of V is the eigenvector of E whose eigenvalue is the first diagonal 
entry of D. Do this by typing 


E*V(:,1) 
D(1,1)*V(:,1) 


What does the first of these commands do? What does the second do? How do the outputs compare? 
5. Using similar syntax, show that the second column of V is an eigenvector of EF whose eigenvalue 
is the second diagonal entry of D. (You may notice that some entries in the eigenvectors may 
be represented by a very small value ~ 107!®. This is a numerical artifact; such small values in 
relation to other entries should be taken to be zero. 
6. Type 


L=diag(D) 


This should output a vector. What are the elements of this vector? (Caution: the diag command 
has many uses other than extracting diagonal elements.) 


The exercises above have led you through an exploration with m = 5. In the following exercises, you 
will visualize the heat diffusion when m = 100; this is the (coordinate) vector space of heat states 
shown in the demonstration of Exercise 1. 


7. Now repeat Exercises 2 and 6 with m=100 to get the new eigenvectors and eigenvalues of E. It is 
a good idea to suppress outputs by appending commands with a semicolon. 

8. Below are commands for viewing 5 eigenvectors with their corresponding eigenvalues. Plot these 
by typing 


choices=[80,85,90,95,100]; 
EHigenStuffPlot(V(:,choices) ,L(choices) ); 


How are the individual eigenvectors similar or dissimilar? 
9. Make some observations about the relationship between these eigenvectors and eigenvalues. 
10. Choose different eigenvectors to view from the list of m = 100. Plot these eigenvectors using 
similar syntax to that given in Exercise 8. 
11. Write a list of observations relating eigenvectors and eigenvalues. View more choices of eigenvectors 
as needed. 
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In Section 6.1, we wrote an arbitrary heat state as a linear combination of the eigenvectors. In 
the following exercises, we will explore diffusion on a heat state that is a linear combinations of 
eigenvectors. 


12. In this exercise, we consider a heat state made of a linear combination of two eigenvectors 3 ;, and 
37. with weights a ;, and ;,. To view the diffusion of this heat state over k = 50 time steps, type 
the following commands: 

choices=[60,80]; 

alpha=[1,-0.25]; 

k=50; 
DiffuseLinearCombination(V(:,choices) ,L (choices) ,alpha,k) ; 


What linear combination did you plot? 

13. Repeat Exercise 12 with a different choice of eigenvectors (choices) and scalars (alpha). The 
values in alpha should be chosen between —2 and 2 so that the code is numerically stable. You 
can also change k, but making it larger than 500 can result in a long computation and plotting 
time. You can also change the pause time between frames by giving a fifth input to the function 

DiffuseLinearCombination which is the pause time in seconds. 

14. Make a list of observations about the diffusion of a linear combination of two eigenvectors. Try 

various linear combinations as needed. 

15. Next, consider the evolution of the linear combination of five eigenvectors shown below: 


choices=[60,70,80,90,100]; 
alpha=[1,-1,1,-1,1]; 
k=100; 


What linear combination did you plot? 

16. Try various heat states that are linear combinations of 5 eigenvectors and make a list of observations 
about the diffusion of such heat states. 

17. Use the above explorations to make a statement about diffusion details for an arbitrary heat state 
u = aj, +0202 +... + mm, where the G; are eigenvectors. 


6.3.3. Termination Criterion 


Let us now consider a company that makes parts such as those described above. The company engineers 
have determined, through various experiments, that the rod can be removed from the apparatus once 
the thermal stress drops below a given level. The thermal stress is proportional to the derivative of 
temperature with respect to position. One way to ensure this condition is to allow the rod temperature 
to equilibrate to a specified low level, but during this time the machine is unavailable for use. 


The Goal 


Your goal is to use your knowledge of linear algebra to provide and test a method for determining the 
earliest possible time at which a rod assembly can be removed safely. 


The Safe Removal Criterion 


Consider an initial heat state vector u(0) = (wu; (0), u2(0), ..., 4100(0)) such as the one depicted in 
Figure 6.2. Consider also an orthonormal eigenbasis {v), v2, --- , vio9} of the heat diffusion operator EF. 
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Let the eigenvectors be ordered by decreasing eigenvalue. Any heat state can be uniquely written in 
terms of this eigenbasis. In particular, suppose 


u(t) = ai (t)v1 + a2(t)v2 +--+ + ano0(t)v100. 


The rod is safe to remove from the apparatus when |a,(t)| < |a,(t)| for all k > w, where w is the 
number of diffusion welds in the rod. In the example above, w = 4 and the rod can be safely removed 
at any time when |ag(t)| < |a4(t)| fork =5,6,--- , 100. 


Predicting Safe Removal Time 


In this task you will predict (based on your intuition and knowledge of heat flow) the order in which 
several diffusion weld scenarios attain the safe removal criterion. 


1. Collect the following files for use in MATLAB/OCTAVE: PlotWeldState.m and WeldStates.mat. 
The first is a script that makes nice plots of weld states, the second is a data file containing raw 
temperature heat state vectors with variable names hs1 through hs5. 

2. Display an initial heat state by running the MATLAB/OCTAVE commands 


Load WeldStates.mat 
PlotWeldState (hs) 


where hs is a vector of heat state temperatures, such as hs5. While not necessarily clear from the 
heat state plots, each sample heat state corresponds to a diffusion weld process with exactly four 
weld locations. Some of these welds are close together and difficult to distinguish in the heat state 
signature. 

3. Predict the order in which these example heat states will achieve the safe removal criterion. (Will 
heat state hs1 reach the safe removal criterion before or after heat state hs2?, etc.) Do not perform 
any computations. Use your knowledge of heat diffusion and eigenstate evolution. The goal here 
is to produce sound reasoning and justification, not necessarily the correct answer. We will find 
the correct answer later in this lab. 


Computing the Eigenbasis for E 


In this task you will compute the eigenbasis for the diffusion operator E and the initial coefficients 
a(0) = (a1 (0), a2(0), ..., a100(0)) for the given heat states. 


1. Compute the diffusion operator E, a set of eigenvectors as columns of matrix V and corresponding 
eigenvalues L. Use the methods from Section 6.3.2. 


2. Reorder the eigenvectors (and eigenvalues) in order of decreasing eigenvalue. Here is the code to 
do this. 


[L,idx]=sort(L, ‘descend’ ); 
V=V(:,10x) ; 


3. V is a matrix whose columns are the eigenvectors of E. Examine the matrix Q = V'V. What do 
the properties of Q tell you about the eigenvectors vg (columns of V)? 

4. How can you compute the eigenbasis coefficients a = (a1, @2, ..., 100) for any heat state uv witha 
single matrix multiplication? Compute these coefficients for the example heat states (hs1 through 
hs5) and plot them. Observe that the safe removal criterion is not met for any of these initial states. 
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Determining Safe Removal Times 


In this task you will determine the safe removal times for each example heat state by considering the 
time-dependent coefficients a(t) = (a1 (t), a2(f), ..., @100(f)). 


1. Suppose we have a rod with w diffusion welds with initial heat state described by the eigenvector 
coefficients a(0) = (a1 (0), a2(0), ..., @1900(0)) with corresponding eigenvalues A, Az, ..., A100- 
Devise a computational procedure for quickly determining how many time steps must pass before 
the safe removal criterion is met. 

2. Apply your procedure to the given five initial heat states, determining the minimum times, say 
ty, 43, t3, tg, and 3, at which the rods can be safely removed. 

3. Compare your computed removal time values with the order of safe removal you predicted in 
Task #1. Explain carefully why your predicted order did, or did not, match your computed order. 


6.3.4 Reconstruct Heat State at Removal 


In this task you will reconstruct the heat state at the time of safe removal and make some final 
conclusions. 


1. For each of the rods, compute the heat state vector u(t;) where j = 1, 2,3,4,5 and t is the safe 
removal time for heat state 7 which you found in Task #3. 

2. Discuss the nature of these heat states with particular attention to the original goal of removing 
the rods when the thermal stress is low. 


6.4 Markov Processes and Long-Term Behavior 


Our study of the evolution of heat states centered around understanding the effects of multiple repeated 
applications of the heat diffusion transformation to an initial heat state. We observed the following: 


e Heat states that are eigenvectors (we call such heat states “eigenstates”) transform by an eigenvalue 
scaling. 

e All eigenvalues for the heat diffusion transformation are between 0 and 1, corresponding to our 
observation that all heat states diffuse toward the zero heat state. 

e Eigenstates are sinusoidal in shape, and eigenstates with higher frequency have lower eigenvalues, 
corresponding to a faster diffusion. 

e The heat diffusion transformation has a basis of eigenvectors, so the evolution of arbitrary heat 
states can be simply calculated in the coordinate space of an eigenbasis (by scaling each coordinate 
by the corresponding eigenvalue.) 

e In the diffusion of an arbitrary heat state, the components corresponding to higher frequency 
eigenstates disappear first, while the components corresponding to lower frequency eigenstates 
persist longer. 


In this way, the result of repeated applications is determined by the nature of the eigenvalues. We can 
make similar observations about linear transformations whose eigenvalues can be greater than one. As 
with the heat diffusion, the contribution for an eigenvector v; is diminished over time if it corresponds 
to the eigenvalue A; with 0 < A; < 1. But, if some eigenvalue A; > | then over time the contribution 
of the corresponding eigenvector v; is magnified. 
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In this section, we seek to understand yet more about the behavior of vector sequences that result 
from repeated application of a transformation FE : R” — R”. In other words, given a starting vector 
u(0) € R”, we define the sequence u(0), u(1), u(2),... by u(k) := E*u(0), and we are interested 
in how u(k) changes as k increases. One important question we might ask is whether the sequence 
approaches some limit, that is, as we repeatedly apply the transformation, is the output getting closer 
and closer to some vector? In mathematical terms, we are asking whether the limit u* = limy_s 49 u(k) 
exists, and, if it does, how easily it might be found. This type of analysis appears in a variety of contexts 
a few of which are mentioned here. 


e Consider a diffusion welding process. We have an initial heat state u(0) € IR” and a diffusion 
operator E : R” — R’”. The heat state u(k) = E*u(0) is the heat state k units of time later than 
the state u(0). The repeated application of F assisted us in finding a time at which we could safely 
remove a rod from the welding apparatus. We expect that u* is the zero heat state because heat is 
continually lost at the ends of the rod without replacement. We would like to know if this behavior 
occurs slowly or rapidly and if this property is a general property of transformations. 

e Consider the animal populations of a particular habitat as a vector u in R”: let the jth component, 
uj(k) be the population of species j at time k. If future populations can be (linearly) modeled 
entirely based on current populations, then we have a population dynamics model u(k) = E*u(0) 
that predicts the population of all m species at each time step k. We might wonder if any population 
ever exceeds a critical value or if any species die out. These questions revolve around the limiting 
behavior. 

e Consider a simple weather model that predicts the basic state of the weather (sunny, partly cloudy, 
rainy) based on the observed weather of recent previous days. The key idea is that the state of the 
weather tends to change on time scales of more than one day. For example, if it has been sunny 
for three days, then it is most likely to remain sunny the next day. The state vector contains three 
values: p;, the probability that the current day’s weather will be of type j = 1, 2, 3. This model 
cannot accurately predict weather because, among other difficulties, it only predicts the probability 
of the current weather state. Also, the long-term behavior of the model provides information on 
overall weather type average occurrence. 


These examples motivate us to understand when a limit matrix, limy_.o0 E k exists and how to quickly 
compute it. We will find that limiting behavior and characteristics are determined by the eigenvalues 
and eigenspaces of the transformation matrix. 

Whenever we discuss vectors in this section, it is always in the context of multiplying them by 
matrices. Therefore, it only makes sense that the definitions herein expect that the vectors lie in a 
real-valued vector space R” such as a coordinate vector space. 


6.4.1 Matrix Convergence 


We begin with some essential ideas on what it means for a sequence of matrices to converge and some 
key arithmetic results. In particular, we want convergence to be defined component-wise on matrix 
elements, and we want to know when matrix multiplication and taking limits commute. These results 
allow us to perform arithmetic with limit matrices. 


Definition 6.4.1 


Let A = {A1, Ao, ...} be a sequence of m x p real matrices. A is said to converge to the m x p 
real limit matrix L if 
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an. (Ak) ij = Lij 


for all 1 <i < mand1 <j < p. In the case when p = 1, we call the limit a limit vector. 


This definition simply says that a sequence of matrices (of a fixed size) converges if there is entry- 
wise convergence. 


Notation. 
In previous sections of this text, we used the notation A; to denote a column of the matrix A. In this 
chapter, we are using the same notation to indicate a term in a sequence of matrices. Be aware of the 
context as you read this chapter. 

The next lemma tells us that left or right multiplication by a fixed matrix to each sequence element 
does not destroy the existence of a limit matrix. 


Lemma 6.4.2 
Let A = {A}, Az, ...} be asequence of m x p real matrices that converge to the limit matrix L. 
Then for any £ x m real matrix B, limg—.o0 BAx exists and 


lim BA, = BL. 


k-oo 


Also, for any p x n real matrix C, limg_-+o0 AgC exists and 


lim AgC = LC. 
k->0o 


Proof. Suppose A = {A1, Az2,...} is a sequence of m x p real matrices that converge to the limit 
matrix L. Let B be an arbitrary @ x m real matrix and let C be an arbitrary p x n real matrix. 


(a) We show that limz.o, BA, exists and equals BL by showing that corresponding entries of 
limg+oo BA, and BL are equal. 


m 
(im. Bay) = Jim 2 Bir (Any; 
J T= 
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(b) We show that limg—.o0 AxC exists and equals LC by showing that corresponding entries of 
limg—+o0 AxC and LC are equal. [The remainder of the proof is similar to that of part (a) and 
is the subject of Exercise 6.] 


There are some important consequences associated with Lemma 6.4.2. 


e Itis possible for vy = Axu to exist for every k and for limg_, oo vg to exist, but for limz_, 4. Ax to 
not exist. For example, if u is the zero vector, then v, = Agu is the zero vector. So limg_+o0 vz = 0, 
but A; may not have a limit. 

e Regardless of choice of basis B, if limg_. oo Ax[v]g exists, the limit will always be the same vector, 
though expressed as a coordinate vector relative to the chosen basis. 

e If we have a sequence of vectors {ux} created by uz = Akuo, then (if the limit matrix exists) 


lim uz ={ lim AF ) uo. 
k-oo k->0oo 


Example 6.4.3 Consider the sequence of matrices A = {A1, Az, ---} defined by 


14+277 1” 
An = ( 1 ) 
n 
3 5 9 
= 1 2 1 z 1 
a= (3 J. a= (i). a= (Fl) 
11 i. i} 


The limit matrix L is defined element-wise: 


That is, 


, limysoo 1 +27" limy-so0 1” 11 
L= lim A, = : . 1)= ‘ 
noo limy_so0 1 limy-+o0 = 


Furthermore, for matrix C = ( 3 a E i we have 


—n ny) 4_9-n «qn —n 
lim A,C = lim a 2d ); 1-2 (my +1+2 JeG 1 {p= Ee. 


n—0o noo £—] 1-—= 
n n 


Recall, our goal is to determine the long-term behavior of a process defined by repeated multiplication 
on the left by a matrix. The next lemma and theorem provide our first important results concerning 
sequences of matrices given by repeated left matrix multiplication. 


Lemma 6.4.4 
Let A be areal n x n matrix and suppose the limit matrix L = limy_,., A* exists. Then, for any 
n x n invertible real matrix Q, limy_, oo (QAQ-')* =QLQ-!. 


Proof. Suppose A is a real n x n matrix for which the limit matrix L = limj_, 59 A* exists, and let 
Q be any n x n invertible real matrix. We show directly that limz_, 45 (QA om = QLQ™'!. Using 
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Lemma 6.4.2, we have 
1 s 1 1 1 
lim (ao ) = lim [(gao- ) (QA ) ..-ktimes... (QAQ- ) 
k-0o k>oo 
= lim QA‘ Qu! 
k>oo 


= (tim oa") oO” 
= OLQ"!. 


Theorem 6.4.5 
Let A be ann x n real diagonalizable matrix. Every eigenvalue of A satisfies —1 < \ < lifand 
only if L = limg_soo A* exists. 


Proof. (=) Suppose A is ann x n real diagonalizable matrix and every eigenvalue of A satisfies 
—1 < \ < 1. Since A isdiagonalizable, A = QDQ7!, where Dis the diagonal matrix of eigenvalues of 
A and the columns of Q form an eigenbasis for R”. Notice that if we define D® as D® = limg_ 69 D*, 
then we know D® exists because 


lim \ = 
—>0o 


1ifA=1 
0 if0<A<1 


Then, using Lemma 6.4.4, 
lim A‘ = lim OD,Q7!'=O ( lim v*) o-'=abD~a"!. 
k->0o k>0o k- 00 


Thus, limg_s oo AF exists. 
(<=) Now, suppose A is ann x n real diagonalizable matrix for which L = limg_,.. A“ exists. We 
know that there exist matrices Q and D so that A = QDQ™!, where D is the diagonal matrix 


k 


M 
d2 


An 


whose diagonal entries are eigenvalues of A. We also know that, for k € N, A‘ = QD‘ Q~!, where 
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Now, limg_+o0 D* exists only when Ns exist for 1 < j <n. That is, limg—+oo D* exists only when 


—1 < 4; < 1 for each eigenvalue A ;. Thus, limg— 9 A* also exists under the same condition. 


_ (1/2 0 
a= (ih aa) 


_ 1 (10) (1/2 0 io 
A= ODO aie oa 


Since A has eigenvalues A; = 1/2 and A72 = 3/4, Theorem 6.4.5 tells us that limg_, 99 exists. In this 
case, limz_+o0 D* is the 2 x 2 zero matrix. So, limg-+o0 A* is also the 2 x 2 zero matrix. 


hel ea 


A has eigenvalues Aj = A2 = —1 with corresponding eigenspace 


c=smm{(°).()} 
A=QDQ"'= (c . ic “) ( i) 


Since the eigenvalues are not in the interval (—1, 1], Theorem 6.4.5 tells us that limg— oo AF does not 
exist. In this case, Ak = —Iy for k odd and A* = Jy for k even. 


Example 6.4.6 Consider the matrix 


A is diagonalizable as 


Example 6.4.7 Consider the matrix 


So, A is diagonalizable as 


6.4.2 Long-Term Behavior 


Definition 6.4.8 


Let T : R” — R” be a transformation defined by T(v) = Mv where M is areal n x n matrix, and 
let vp € R”. Then, the long-term state of vg relative to T is given by v* = limg+o0 M k vg whenever 
this limit exists. 


Recall that, in the heat diffusion application, we can describe long-term heat states according to 
the eigenvalues of the diffusion transformation. In fact, we found that, for initial heat state vo, the 
long-term state relative to the diffusion transformation defined by T (i) = Eh was described by 


lim E* yp = lim (ku, + Akun +... + Mon, ) 
k->0o k->oo 
where A1, A2,..., An are eigenvalues of E corresponding to eigenvectors v1, U2, ..., U,. We can extend 


this idea to any transformation. That is, if a state evolution transformation matrix is diagonalizable 
with eigenvalues satisfying —1 < \ < 1, then we can find a simple expression for the long-term state. 
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Thus, the long-term state of a vector relative to a diagonalizable state transformation can be simply 
written in terms of the eigenvectors of M as long as every eigenvalue of M satisfies —1 < A < 1. 
The next theorem describes the long-term state in terms of the eigenvalues and their corresponding 
eigenvectors. 


Theorem 6.4.9 
Let A beann x n real diagonalizable matrix where every eigenvalue of A satisfies —1 < A; < 1, 
with corresponding eigenbasis € = {Q 1, Qo,..., Qn}. Then the long-term state of any v € IR” 
relative to A is 
uv — 9a 0, 
jes 


where a = [v]¢ is the corresponding coordinate vector of v and J = { JEN | AVS Ue 


The notation in the theorem allows for repeated eigenvalues. That is, it is possible that A; = A; for 
i # j. Notice also that J is a set of indices allowing us to keep track of all eigenvalues equal to 1. 


Proof. Suppose A is ann x n real diagonalizable matrix where every eigenvalue of A satisfies —1 < 
Aj < 1. Define € = {Q1, Q2,..., Qn} to be an eigenbasis of A. Finally, let J = {i EeN|Aj= Ly: 
We know that there are matrices Q and D so that A= QDQ7™!. In fact, the columns of Q are 


Q1, Qo,..., Qn and D is diagonal with diagonal entries 1, A2,..., An. Let v be any vector in R”, 
and define a = [v]¢e. Then, a = Om!v. 
We also know that, for anyk € N, D* isthe diagonal matrix whose diagonal entries are x Me eas se 
Since 
es =f" ifj¢J 
ks00 / 1 ifjeJ 


we find that the long-term state of v relative to A is 


v= (tim 4) o- im, (Atv) = Jim ((QD42~») 


00 
k-00 
n 
he Ak 
a2 (im, .) ajQs 
j=l 
= : k QO: : k er 
~ 2 (im. ) aj Qi + » (im. .) aj Qi 
jet J¢J 
= Yo aj Q;. 


jet 


n 
= lim (QD*a) = lim |S XjajQ; 
j=l 
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Table 6.1 Percent of population 


em inner city downtown suburbs out-of-area 
described in each column who y 


move to (or remain in) the inner city 93 2 3 0 
location described in each row. downtown 1 67 12 3 
suburbs 3 30 76 18 
out-of-area 3 1 9 719 


Example 6.4.10 The heat diffusion operator is an example of a dissipative transformation. The total 
heat in the heat state is not constant, but decreases over time. This is a direct consequence of the fact that 
all of the eigenvalues have magnitude less than one: ||); || < 1 for all 7. According to Theorem 6.4.9, 
Uoo = 0, so all heat eventually leaves the system. 


Example 6.4.11 A recent four-year study of population movement around a large city made the 
following observations. The study considered four regions: inner city, downtown, suburbs, and out-of- 
area. Over a four-year period, 93% of inner city people remained living in the inner city, 1% moved 
to the downtown area, 3% moved to the suburbs, and 3% moved out-of-area. Similar data is reported 
for each category. A sample population of 10,000 was tracked (including if they moved out-of-area). 
Participants in the study were, at first, residents of the city or its suburbs. 

Table 6.1 shows, for example, that in one year of the study 30% of people who live downtown were 
reported to have moved to the suburbs within four years. And likewise, 18% of people who were former 
city residents moved back to the city suburbs. Supposing that the trend indicated in the study continues, 
we can determine the long-term state of the city’s total population of 247,000 which is distributed as 
follows: 12,500 in the inner city, 4500 downtown, and 230,000 in the suburbs. 

This population movement model is not dissipative because the total population does not change 
(no new residents are counted). The population in any given four-year period is a state vector v € R*. 
We define 


VI inner city population 
__ | v2] _ | downtown population 
= To; ]— | suburb population 
U4 out-of-area population 


The movement of people over a four-year period is modeled by left matrix multiplication by 
transformation matrix E defined by 


0.93 0.02 0.03 0.00 
0.01 0.67 0.12 0.03 
0.03 0.30 0.76 0.18 
0.03 0.01 0.09 0.79 


If vz is the population distribution at time period k, then vy; = Ev, is the population distribution at 
time period k + 1. We have 
12,000 
4,500 
230,000 
0 


vo = 
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and 
0.93 0.02 0.03 0.00 12,000 18,150 
ee oe 0.01 0.67 0.12 0.03 4,500 - 30,735 
0.03 0.30 0.76 0.18 230,000 176,510 |’ 
0.03 0.01 0.09 0.79 0 21,105 
and 
0.93 0.02 0.03 0.00 18,150 22,790 
0.01 0.67 0.12 0.03 30,735 42,588 
v2 = Ev, = ww 


0.03 0.30 0.76 0.18 176,510} | 147,712 
0.03 0.01 0.09 0.79 21,345 33,411 


The long-term population state is given by v* = limg+o E*vo. If E is diagonalizable and 
limg_so0 E* = L exists, then we can use the results of this section to find v* = OL O-'v. The 
eigenvalues of E are Ay = 1.0000, Az ¥ 0.9082, A3 © 0.7367, Aq © 0.5051. E has four distinct 
eigenvalues and is diagonalizable as E = QDQ7' where 


4 0 0 0 +0.415 +0.843 +0.115 —0.031 

0» 0 0 +0.322 —0.219 —0.439 —0.543 
D= and Q & 

0 0 A3 0 +0.753 —0.465 —0.446 +0.806 

00 0X4 +0.397 —0.159 +0.771 —0.232 


Using the notation of Theorem 6.4.9, we have J = {1} and a; * 130,647 (the first entry of Q-!vo). 
So, by Theorem 6.4.9, 

54,173 

42,114 

98,328 

51,885 


v* = (130,647) Q| = 


This long-term behavior analysis predicts that in the long term, about 54,000 people will be living 
in the inner city, about 42,000 downtown, about 98,000 in the suburbs, and about 52,000 will have 
moved out of the area. 


6.4.3. Markov Processes 


The population movement model of Example 6.4.11 is an example of a non-dissipative transformation. 
The total population at each iteration is constant at 247,000, the initial population. We found that 
the eigenvalues of the population movement transformation satisfied —1 < A; < 1, and one of the 
eigenvalues was exactly 1. The eigenspace corresponding to \ = | contained the long-term state v*. 

On the other hand, the heat diffusion transformation is an example of a dissipative transformation. 
The total quantity of heat in the initial state vp is not preserved under transformation by E. We found 
that the eigenvalues of this transformation satisfy —1 <  < 1, no eigenvalue taking on the exact value 
1. The long-term state was v* = 0. 

These observations on the difference in behavior of these transformations and their eigenvalues is 
not a coincidence. While both transformations have matrix entries satisfying 0 < E;,; < 1, they differ 
in the property of column stochasticity. 
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Definition 6.4.12 


Ann x n real matrix A = (a;;) is said to be column stochastic, or a Markov Matrix, if for all 
1<i,j <n,ajj = 0 and A'l, = 1,, where 1, is the vector whose entries are all 1. 


Definition 6.4.12 asserts that a Markov matrix is a square matrix of all nonnegative entries that has 
columns that sum to |. The following theorem, along with Theorem 6.4.9, allows us to understand the 
long-term behavior of any column stochastic matrix. 


Theorem 6.2.13 
Let A be acolumn stochastic matrix. Then every eigenvalue \ of A satisfies |A| < 1 and at least 
one eigenvalue has value 1. 


Proof. Suppose A is a column stochastic matrix. Then, A'l,, = 1,, so 1, is an eigenvector of A‘ with 
eigenvalue 1. Since, by Theorem 6.2.11, A and A’ have the same eigenvalues, at least one eigenvalue 
of A has value 1. 

Now, consider any eigenvector v = (v 1 U2... Un)" of A! with eigenvalue \. Suppose v, is the entry 


in v with largest absolute value and let ag = (are age... ane)’ be the 2" column of A. We have 


n 
T T 
apv = Ave and ajv = Y > ajev;. 
j=l 


Therefore, : 
|Allvel = lagul < 5 ajelvel = vel. 
j=l 
Thus, |A| < 1. 
It is possible for a Markov matrix, M, to have one or more eigenvalues \ = —1, in which case 


Theorem 6.4.5 tells us that limp... M’ K does not exist. 


Example 6.4.14 The population movement transformation of Example 6.4.11 is a Markov matrix. 
Both E and E! are column stochastic. 


Example 6.4.15 State University has initiated a study on the stability of student fields of study. 
The University accepts students from four major fields of study: Humanities, Physical Sciences, 
Mathematics, and Engineering. Students are allowed to change their major field at any time. Data 
collected over the past several years reveals the overall trends in changes of major as follows. 
Suppose x; (k) is the number of students in year k studying under major j (j = 1, Humanities; j = 2, 
Physical Sciences; j = 3, Mathematics; 7 = 4, Engineering), then the change in one year is given by 
x(k + 1) = Ex(k) where 

0.94 0.02 0.01 0.00 

0.02 0.85 0.14 0.02 

0.03 0.03 0.83 0.08 

0.01 0.10 0.02 0.90 
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For example, the entry 0.14 in the second row and third column indicates that 14% of Mathematics 
majors will change to become Physical Science majors each year. The University would like to answer 
two questions: 


(Q1) If equal numbers of each major are admitted as first-year students (2500 of each major), what 
will be the distribution of majors at graduation four years later? 

(Q2) What distribution of majors should be admitted so that the same distribution exists at graduation 
four years later? 


To simply answer (Q1), we can compute x (4) = E*x(0) where 
x(0) = (2500 2500 2500 2500)" ; 


The result is x(4) ¥ (2229 2692 2298 2781)". The students have changed areas of study so that 
there are more Physical Sciences and Engineering majors than were originally enrolled, and fewer 
Humanities and Mathematics majors. 

In order to answer (Q2), we can use the fact that E is a Markov matrix (be sure and verify this 
fact). We know that E will have at least one eigenvalue \ = 1. Any eigenvector in the corresponding 
eigenspace does not change in time. That is, x(k + 1) = Ex(k) = Ax(k) = x(k). So, any initial student 
distribution that is in the eigenspace of vectors with eigenvalue | will remain the same after four years (or 
any number of years). E has eigenvalues 1, 0.933, 0.793, 0.793. And the one-dimensional eigenspace 
associated with \ = 1 contains the vector v ~ (0. 1342 0.2844 0.2363 0.345 Ne This vector v is scaled 
specifically so that the sum of the entries is 1. So, the entries represent the fraction of students of each 
type of major that will be stable from year to year. 


= Path to New Applications 

Researchers modeling dynamical processes with differential (and difference) equations often 
follow paths similar to the paths discussed in this chapter for the Diffusion Welding example and 
to solve Markov processes. The path to solving these problems typically require that you find 
eigenfunction solutions in order to find general solutions (solutions that are linear combinations 
of the eigenfunctions). See Section 8.3.1 to learn more about connections between these tools 
and applications. 


6.4.4 Exercises 


1. Consider a matrix sequence A = {A}, A2,---}. For each of the following matrix definitions, find 
the limit matrix L = limyz-_so Ax if it exists. 
1 . 
~ 2k cos! sin1l 
= k = 
OQ Ar= (k A) At (: sin 1 cos 


2 k-2-1 ¢ + 
(b) a=(i :) © A= (\h . 


0.45* —0.36* 
_ [cos(2rk) 1 (f) A, = ( ’ ) 
a= ( 1 ae) 0 0.95" 
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2. 


For each of the following matrices M, find L = limg_.o0 M k if it exists. If limg-so0 M k does not 
exist, then describe the non-convergent behavior. 


il -1 =O a 
@ m=(",7') (d) M=[010 

WAT, O41 
(b) M= (‘; a 


0.35 0.65 
cies ie i) 


10 
00 © M=(51)- 


001 


. For each of the following matrices F,, determine if it is column stochastic. If itis column stochastic, 


then determine limg_, 55 E*. 


0.44 0.32 0.9 0.8 0.5 
a= ae aa ee ee 0.2 ‘) 
— Ath _ {1.00 0.40 
a= C )) f= ce ea) , 
. Construct a 2 x 2 Markov matrix M with eigenvalues \ = +1. Show that limz... M * does not 


exist. Show that for some v € R?, limg_.40 M*v does exist. Explain your findings in terms of the 
eigenspaces and eigenvalues of M. 


. Consider a Markov matrix M with eigenvalues 1 = A; > A2 > +--+ > Am—1 = Am = —1. Show 


that limg_soo M* does not exist. For what v € R” does v* = limg_+o M*v exist? Explain your 
findings in terms of the eigenspaces and eigenvalues of M. 


. Prove part (b.) of Lemma 6.4.2. 
. Consider the population movement study (Example 6.4.11). 


(a) Find [volg, where QO = {Q1, Qo, Q3, Q4}, the given eigenbasis for E. Verify that vo has 
components in each of the eigenspaces. 

(b) Verify that the given population vectors vo, v1, v2 give the distribution for the same number 
of people (except for rounding uncertainty). 

(c) How can the long-term contribution from the eigenspaces with A; < 1 diminish without vx 
loosing population? 


. The Office of Voter Tracking has developed a simple linear model for predicting voting trends based 


on political party affiliation. They classify voters according to the four categories (in alphabetical 
order): Democrat, Independent, Libertarian, Republican. Suppose, x € R* is the fraction of voters 
who voted by the given parties in the last gubernatorial election. In the next election, OVT predicts 
voting distribution as Ex where 


0.81 0.07 0.04 0.01 
0.08 0.64 0.01 0.08 
0.08 0.21 0.89 0.07 
0.03 0.08 0.06 0.84 


Suppose x = (0.43, 0.08, 0.06, 0.43)', indicating that 43% voted for the Democratic candidate, 
6% voted for the Libertarian candidate, etc. In the next election, OVT predicts the voting to be 
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6 Diagonalization 


0.81 0.07 0.04 0.01 0.43 0.363 
ee 0.08 0.64 0.01 0.08 | | 0.10 0.133 
0.08 0.21 0.89 0.07 | | 0.08 0.157 
0.03 0.08 0.06 0.847 \0.43 0.387 


(a) Verify that E is Markov. How would you interpret this property in terms of the problem 
description? 

(b) Verify that Z' is Markov. How would you interpret this property in terms of the problem 
description? 

(c) Find the long-term behavior of this voter tracking problem. Interpret your result. 


. Consider a two-species ecosystem with a population of x; predators and a population of x2 prey. 


Each year the population of each group changes according to the combined birth/death rate and 
the species interaction. Suppose the predatory species suffers from high death rates but flourishes 
under large prey population. That is, suppose 


1 2 
x1(t + At) = ari) + 5720). 


Also, suppose that the prey species has a very robust birth rate but looses a large portion of its 
population to the predator species. In particular, 


1 6 
x2(t + At) = gu + 522). 


We then have the evolutionary system 


5 04 
a ee ) tps 


Suppose the year zero population is x(0) = [100 200]'. Then we find some future yearly 


populations 
0.5 0.4 130 
ae ee 13) x) 7 be) 


12) = ( 0.5 3) * ) 


—0.25 1.2 
0.5 0.4 166 
40) = ee ) me i) 

Using eigenvector/eigenvalue analysis, discuss the long-term behavior of the predator-prey 
problem. 
In Exercise 9 you investigated a predator-prey model for population growth. In this exercise you 
will construct what is known as a phase diagram for the same situation that graphically displays 
the model predictions for various initial populations. In R?, let the x1 -axis represent the population 
of prey and the x2-axis the population of predators. 
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11. 


(a) Draw the eigenspaces and label them with the corresponding eigenvalues. Plot the population 
trajectory for the first few years: x (0), x(1), x(2), x(3). Using this sketch, predict the population 
trend over the next several years and the long-term behavior. 

(b) The eigenspaces cut the plane into four regions. What will happen if the initial population 
vector lies in another of the four regions? 

(c) How will the population evolve if the initial population lies in an eigenspace? 


A virus has begun to spread in a large forest and trees are beginning to die. A study by the forest 
management has revealed some details on how the state of the forest changes from year to year. 
The state of the forest is given by a vector x € R* indicating what fraction of the acreage is healthy 
(x1, currently unaffected by the virus), exposed (x2, exposure to the virus, but resistant), sick (x3, 
actively fighting the virus), and dead (x4). The study revealed the following details: 


e 80% of healthy trees will remain healthy the next year—20% will become exposed. 

e Among exposed trees, next year 20% will become healthy, 10% will remain exposed, and 70% 
will become sick. 

e Among sick trees, next year 10% will be healthy, 30% will remain sick, and 60% will die. 

e 90% of dead acres will remain dead in the next year—10% will have healthy new growth. 


(a) If the forest managers take no action, what will be the long-term health state of the forest? 

(b) Suppose the managers institute a program which boosts the forest resistance to the virus 
resulting in only 30% of exposed trees becoming sick the next year, with 60% becoming 
healthy and 10% remaining exposed. What will be the long-term health state of the forest in 
this scenario? 


The following exercises explore one-dimensional heat diffusion in modified scenarios similar to the 
diffusion welding application. 


12. Consider the one-dimensional heat diffusion operator for the situation in which the rod is thermally 


insulated from its surroundings. No longer will the heat flow out the ends of the rod. In this situation, 
heat at the ends of the rod can only flow in one direction and we have the matrix transformation 
in the standard basis (compare Equation 4.3): 


1-6 6 0 0 
6 1-26 6 
0 }6 
E= (6.6) 
} 0 
6 1-26 6 
0 ae 0 6 1-6 
In this scenario, temperatures are sampled at locations x = '/2,3/,5/2,---,m+'/2, and E is 


(m+1)x (m+). 
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6 Diagonalization 


(a) Verify that E is a Markov matrix. Is E diagonalizable? 

(b) Use the theoretical results of this section to characterize the eigenvalues of E. 

(c) Find the eigenvalues and representative eigenvectors for the case m = 3. Do your results 
support your answer of part (12b)? 

(d) Verify that the following set is a basis for R’” composed of eigenvectors of E. 


B= {wo0, w1,-:-, Wm}, 
where 
kr 3kr Ski (2m + Ikr\! 
wz, = | cos , COS , COS ,7++ , COS ——————_ ]} . 
2(m + 1) 2(m + 1) 2(m + 1) 2(m + 1) 


(e) Compute the eigenvalues of E. 
(f) Describe the long-term heat diffusion behavior for this scenario. 


Consider the one-dimensional heat diffusion operator for the situation in which some heat from the 
rod is lost by advection—heat loss to the atmosphere. In this case we have the matrix transformation 
in the standard basis: 


1—26-€ ") 0 ae 0 


6 1—26-e« 6 


0 } 
E= : (6.7) 
) 0 
6 1—26-€e }6 
0 ies 0 }6 1—26-e 


where € is a small positive constant. 


(a) Interpret the meaning of the constants 6 and e. 

(b) Is E a Markov matrix? Is E diagonalizable? 

(c) Use the theoretical results of this section to characterize the eigenvalues of E. 

(d) Find the eigenvalues of E and an eigenbasis for E. Use the observation that E = E — elm 
and your knowledge of the eigenvectors and eigenvalues of EF. 

(e) Describe the long-term heat diffusion behavior for this scenario. 


® 


Check for 
updates 


Inner Product Spaces 
and Pseudo-Invertibility 


The tomography problem—finding a brain image reconstruction based on radiographic data—has been 
challenging for a number of reasons. We have found that the radiographic transformation is, in general, 
not invertible, and experimental data need not be in its range. This means that if we assume the given 
linear relationship, then we may either have no solution or infinitely many solutions. We should not 
be satisfied with such an answer. Instead, we use this result to motivate a deeper study. 

Given anon-invertible linear transformation T and co-domain data b, we consider the set of solutions 
X = {x € R” | Ax = b}, where A is the m x n matrix representation of T relative to relevant bases 
for the domain and co-domain of T. There are two cases to consider. 


e If X is not empty, then there are infinitely many solutions. Which among them is the best choice? 
e If X is empty, then which domain vector is closest to being a solution? 


The answers to such questions are very important for any application. If, for example, our brain scan 
radiographs are not in the range of our transformation, then it is important to provide a “best” or “most 
likely” brain image even if it does not satisfy the transformation equation. 

These considerations require us to understand a basic comparative relationship among vectors in a 
vector space. Questions to consider: 


What is the length of a brain image vector? 

What would be the distance between two radiographs? 

How similar are two brain images? Two radiographs? 

How might we project a vector onto a subspace of pre-determined acceptable solutions? 

What is the vector in an acceptable set which is closest to the set of solutions to a linear 
transformation? 


Ot Re Oe i 


In this chapter, we explore these key questions and related ideas. In the end, we will be able to find the 
best possible solutions to any linear transformation equation with bounds on the quality of our results. 
We will reconstruct high-quality brain images from very few moderately noisy radiographs. 


7.1. Inner Products, Norms, and Coordinates 


We have learned to recognize linearly independent sets and bases for finite-dimensional vector spaces. 
These ideas have been quite useful in categorizing elements of vector spaces and subspaces. Linear 
combinations of linearly independent vectors provide a tool for cataloging vectors through coordinate 
space representations. We can even find matrix representations of linear transformations between 
© Springer Nature Switzerland AG 2022 379 


H. A. Moon et al., Application-Inspired Linear Algebra, Springer Undergraduate Texts 
in Mathematics and Technology, https://doi.org/10.1007/978-3-030-86155-1_7 


7 


380 7 Inner Product Spaces and Pseudo-Invertibility 
coordinate spaces. In this section, we explore additional questions that bring vector spaces into sharper 
focus: 


e What is the degree to which two vectors are linearly independent? 
e Can we develop a way to measure the “length” of a vector? What about the angle between two 
vectors? 


Consider the following three grayscale image vectors in Z4,.4(R): 


The set {x, y, z} is linearly independent and the span of the set is a subspace of dimension three. Also, 
since the set is linearly independent, any subset is also linearly independent. However, we intuitively 
see that the set {x, y} is somehow “more” linearly independent than the set {y, z} because images y 
and z are very similar, in gray-scale intensity distribution. We might colloquially say that the set {y, z} 
is “nearly linearly dependent” because Span{y, z} may not seem to describe as rich a subspace as 
Span{x, y}. 

Consider a second example of three vectors in R? illustrated here: 


We can make observations analogous to the previous example. We see that {y, z} is nearly linearly 
dependent, whereas we would not be so quick to make this judgment about the set {x, y}. We say that 
y and z are nearly linearly dependent because they have a nearly common direction. 

In this section, we introduce the inner product as a tool for measuring the degree of linear 
independence between vectors. This tool will allow us to (1) define the distance between vectors, 
(2), define the length of a vector, and (3) define the angle between two vectors. 
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7.1.1. Inner Product 


Motivating example: the dot product in R” 


Building on the previous example of vectors in R?, let @ be the positive angle between two vectors x 
and y in R”. We expect that vectors that are nearly linearly dependent are separated by angles that are 
close to zero radians or close to radians. In these cases, the vectors point nearly along the same line, 
and | cos @| © 1. On the other hand, if 6 ~ 2/2 then the two vectors are very much not scalar multiples 
of each other. In fact, they point nearly along perpendicular lines and cos 6 ~ 0. 

Angles play a large role in quantifying the “degree of linear dependence,” but also, since the zero 
vector is not part of any linearly independent set, we expect that the length of a vector should also be 
considered. Recall that the length of a vector x = (x1, ..., X,) in R” is given by the formula 


lx) = Op t---tapyl?. 


It turns out in R?, and indeed in R” in general, the familiar dot product of two vectors encodes 
information about the degree of linear (in)dependence of the vectors, and hence also about the lengths 
and the angle between the vectors. 

Recall that the dot product is defined as 


X-Yi=S XY t+++ + Xn. (7.1) 


This yields an expression for length of a vector in terms of the dot product of the vector with itself: 
x +x = €(x)*. Moreover, by the law of cosines 


x-y = &(x)e(y) cos 0. (7.2) 


Notice that the inner product is symmetric (x - y = y- x) and linear in the first argument (see 
Exercise 18) as we would hope for a function that quantifies the “similarity” between vectors. 

Hence the dot product is a useful measure of the “degree of linear dependence’”—roughly speaking, 
larger dot products correspond to larger vectors in similar directions, while smaller dot products 
correspond to smaller vectors or vectors in very different directions. 


Inner product 


While this works well in R”, we would like to extend these ideas to general finite-dimensional vector 
spaces. Our approach is to define a general similarity function, called an inner product, that has the 
properties we desire. 


Definition 7.1.1 


Let (V,+,-) be a vector space with scalars in a field F. An inner product is a mapping (-,-) : 
V x V - F that satisfies the following three properties. For every u, v, w € V anda € F 


(u, v) = (v, u) (Symmetric) 


. (au +v,w) =a(u, w) + (v, w) (Linearity in the first argument) 
. (x, x) > Oand (x, x) = Oif and only if x = 0. (Positive definite) 


1 
2 
3 
A 


vector space with an inner product is called an inner product space. 


The notation (-,-) indicates an inner product that allows inputs where the two dots are located. 
Also, the above definition of a (real) inner product assumes F = R or F = Z2. Complex inner product 
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spaces are considered in the exercises for the interested reader. This definition does not preclude the 
possibility that we could define more than one inner product on a vector space. 
In Section 7.1.2, we will use inner products to define notions of length and angle between vectors. 
We now consider several examples of inner products on some familiar vector spaces. 


Example 7.1.2 Let us define (-,-) :M3,2(R) x M3,2(R) — R so that (u,v) is the sum of the 
component-wise products of the matrices u and v. That is, if 


ab gh 
u=|cd and v=|[jk 
ef £m 


then, 
(u,v) =ag+bh+cj+dk+el+ fm. 


We will check that the properties of Definition 7.1.1 hold. Notice that because multiplication in R is 
commutative, (u, v) = (v, u). Now, let 


Then, because multiplication distributes over addition, we see that 


(au +v,w) = (aa+g)n+ (ab+h)p+ (ac+ j)q 
+ (ad+k)r+(ae+ l)s+ (af +m)t 
=a(an+bp+cq+dr+es+ ft) 
+ (gn+hp+ jq+kr-+ ls + mt) 


=a(u,w) + (v, wv). 


Therefore, the linearity condition holds. 
Finally, we compute 
gai=ae tO C 4 P 4e 4" =o 


And, (u, uv) = Oif and only ifa =b=c=d=e= f =0. Thatis, wis the 3 x 2 zero matrix. Thus, 
this is indeed an inner product on M2,.3(R). 


Example 7.1.3 Let (-,-) : P2(R) x P2(R) — R be defined by 


1 
(P1, P2) -|/ Pip2dx, for pi, pz € P2(R). 
0 


Here, p1 p2 is the point-wise product: p; p2(x) = pi(x)p2(x). For the purposes of this text, we will 
call this inner product, the standard inner product! for P, (R). Notice again that because polynomial 
multiplication is commutative, the symmetric property holds: 


(P1, P2) = (p2, Pi). 


' This may not be the most useful inner product for various applications. And, in other contexts, you may see more useful 
inner products on polynomial spaces. 
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Next, we show that this inner product is linear in the first argument. 


1 1 1 
(api + p2, p3) = iu (ap, + p2)p3 dx = a | pip3 dx +f p2p3 dx = api, p3) + (po, P3)- 
0 0 0 


Finally, if p € P2(R) then (p, p) = i p(x) dx > 0. If (p, p) = 0 then i p(x) dx = 0, which 
means p must be the zero polynomial. So, we now have an inner product on P2(R). This example can 
be readily extended to any polynomial space. 


Example 7.1.4 Let x, y € R”, then we define the standard inner product on R” as the dot product: 
(x,y) = x1y1 + x2y2 +... + XnYn- 

Consider the vectors from the introduction to this section, 
x = (-8,8), y= (6,4), z= (65, 2). 

We have” 


(x, y) = (—8)(6) + (8)(4) = —16 
(x, z) = (—8)(5) + (8)(@2) = —32 
(y,z) = (6)(5) + (4)(2) = 38 
(x, x) = (—8)(—8) + (8)(8) = 128 
(y, ») = (6)(6) + (4)(4) = 52 
(z, z) = (5)(5) + 2)(@) = 29. 


The next theorem gives a method for writing the standard inner product on R” as a matrix product. 


Theorem 7.1.5 
Let x, y € R” and let (-, -) be the standard inner product on R”. Then 


=a 


Proof. Let (-, -) be the standard inner product on R” and denote x, y € R” by 


x) y1 
x2 y2 
x= 5 y = y 
Xn Yn 


? Do the following inner product computations align with your geometric intuition? 
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Then, 


al 
=> (x1 X2 ee) C 

Un 
= xly 


The standard inner product on image spaces is similar to the standard inner product on R”. For 
example, suppose images J, J € Z4,.4(R) have ordered entries , and J fork =1,2,..., 16. We 
define the standard inner product (I, J) = 0;°; Kk Jk. 


Example 7.1.6 Consider the three images from the introduction to this section, 


— a 


with numerical grayscale entries 


95 | 89 | 82 | 92 94| 6 | 14| 27 102} 22 | 30 | 23 
23 | 76 | 44 | 74 92 | 35 | 20 | 20 1o9| 45 | 21 | 29 
*Ter{4ele2lisl’ ” | alsi:faola |’ ~ |s0] 5/33] a5| 
49| 2 | 79 | 41 39 | 1 | 60 | 75 97 | 14 | 67 | 83 


Using the standard inner product, we compute 
(x, y) = 39,913, (x,z) = 47,974, (y,z) =51,881, 
(x, x) = 66,183, (y, y) = 46,079, (z,z) =59,527. 


From these inner products, we can define image vector lengths and angles between vectors according 
to the ideas behind Equation 7.2. We set (x) = ./(x, x) and cos 6, y = (x, y)/E(x)L(y) 


(x) © 257, lly) 215, b(z) + 244, 


cos Oy,y © 0.723, cos6,,, ~ 0.764, cosdy, + 0.991. 
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As we predicted (or because we constructed a good measure of linear dependence) y and z are nearly 
linearly dependent by the cosine measure. 


7.1.2. Vector Norm 


As in Example 7.1.6, we can then use an inner product on a vector space to define vector length and 
angle between vectors. We want length and angle to match what we would compute in IR” with the dot 
product. In R”, the length of x is 

l(x) = (x -x)!/? (7.3) 


and the angle 6 between vectors x an y is defined to be 


6 = cos! (aes) ; 
L(x)e(y) 


In a general inner product space, this yields the following: 


Definition 7.1.7 


Let V be an inner product space with inner product (-,-) : V x V — R. We define the norm of 
u € V, denoted ||u||, by 
[Pel] = (us uy)! 


Notice that if we consider the standard inner product on R”, the norm of a vector corresponds to 
the Euclidean length. If uv = (uj, u2,..., U,), then 


[Ia] = ((ue, ue)! 


St? 


= ih uht... +02. 


Definition 7.1.8 


A vector v in an inner product space is said to be a unit vector if ||v|| = 1. 


Example 7.1.9 Consider the inner product computations in Example 7.1.4. We find the lengths of 
each vector: 


IIxl| = Jv, x) = V128 = 6V2 © 11.3, 
lvl = /0, y) = V52 = 2V13 © 7.21, 
IIc] = V/(z, 2) = V29 = 5.39. 


We can also compute the cosine of the angle between pairs of vectors. For example, 


(x, y) —16 


Ixy (63) (2v73) ~ —0.261, 


f(x, y) = cos Oy, y = 
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(y, Z) 38 


Iv lel (3) (V3) ~ 0.979 . 


f(y, 2) = cos Oy,. = 


Notice that | f(y, z)| > | f(x, y)|, which indicates that {y, z} is closer to being linearly dependent than 
{x, y}. 


Example 7.1.10 Reconsider Example 7.1.3. We can find the “length” of a polynomial p € P2(R). 
Let p = ax? + bx +c. Then 


/ 1/2 
he (/ (ax? + bx +6)? a) 
0 


1 1/2 
= (/ a?x4 + 2abx3 + (2ac + b*)x? + 2bex +c? as) 
0 


_ a” ab, 2ac +b? ) ci 
“Se 3 3 


Notice that f(x) = /5x2 is a unit vector in P2 (R) with this inner product. For this function, a = V5 
and b = c = 0, so that || f|| = 1. 


Example 7.1.11 Consider the vector space of 7-bar LCD characters, D(Z2). Let f: D x D > Zp be 
defined by 
1 ifx=y0 


0 otherwise. ra 


fead={ 


We can show that f is not an inner product on D(Z2). Clearly, f is symmetric and 


1 ifa=landx=y#0 
0 otherwise 


f(ax, y)= | = are. 


We also have positive definiteness: f(x,x) = 1 if x #0 and f(x,x)=0 if x =0. However, 
Sat+y,z) #4 f(x, z) + fy, z) for all x, y, z € D. This is the subject of Exercise 14. 


The next example shows that a vector space can have many different inner products and many 
different associated norms. 


Example 7.1.12 Consider the vector space IR? and define (x, y) = x1y1 + bx2y2, where b > Oisa 
scalar. We can show that (-, -) : R* x R? — R is an inner product (see Exercise 9). Every positive 
value of b defines a unique inner product on R? and a unique norm. With this inner product, the length 


of a vector x = (x1, x2) is ||x|| = “i ae + Bie. The set of unit vectors is the solution set of the ellipse 


equation te + bx5 = |. The sets of unit vectors for b = 1 and b = 4 are shown in Figure 7.1. 


Inspired by this last example, we would like to find a generalization of the standard inner product 
on R”. We first need to consider a class of matrices with specific properties, which we outline here. 


Definition 7.1.13 


Let A be ann x n matrix. We say that A is symmetric if and only if A' = A. 
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Fig.7.1 Two example sets of unit vectors in R2 corresponding to the inner products (x, y) = x11 + x2y2 (circle ) and 
(x, y) = x1y1 + 4x2y2 (ellipse). 


A symmetric matrix is symmetric with respect to reflection across the main diagonal. That is, if a;; 
for 1 <i, j < nis the (, j)-entry of A, then aj; = a;;. For example, 
23 5 
M={3-17 
5 7 0 


is a symmetric matrix, but 


is not because a3,2 4 a2,3. 


Definition 7.1.14 


We say that a real-valued square matrix A is positive definite if and only if A is symmetric and all 
the eigenvalues of A are positive. 


Example 7.1.15 Let M be the matrix given by 
411 
M={161 
114 


Using ideas from Section 6.2, we find linearly independent eigenvectors 


1 1 1 
vp=]O0},w=]-1], andv3=]2], 
—l 1 1 


with corresponding eigenvalues A, = 3, Az = 4, and A3 = 7, respectively. Because M is symmetric 
and has all positive eigenvalues, M is positive definite. 


The next theorem provides a characterization of inner products on R”. 
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Theorem 7.1.16 
Letx, y € R” and f : R” x R” — R. Then, f is an inner product on R” if and only if f(x, y) = 
x! Ay for some positive definite n x n matrix A. 


The proof of Theorem 7.1.16 is the subject of Exercise 19. 

Theorem 7.1.16 provides a useful generalization of the concept of vector length in R”. For suitable 
matrix A, the length of x € R” is given by ||x|| = Vx! Ax. If A is then x n identity matrix, we recover 
the standard notion of length ||x|| = /x"Ix = /x'x = , [xi +... 4x2. 

The inner product family of Example 7.1.12 can be written in the form stated in Theorem 7.1.16: 
(x, y) = x' Ay for the symmetric positive definite matrix 


=) 
x Ay = (x1 22) ( 1) 6 


= (x) x2) (?.) 


= x1 y1 + bx2y2. 


(Recall: b > 0.) We have 


7.1.3 Properties of Inner Product Spaces 


The definition of an inner product on vector spaces formalizes the concept of degree of linear 
dependence. From this idea came a general notion of vector length and angles between vectors. In 
our study of inner product spaces, we will discover several other useful properties of the inner product 
and of vector lengths including the familiar triangle inequality and a general concept of perpendicular 
(orthogonal) vectors. First, we consider several symmetry and uniqueness properties. 


Theorem 7.1.17 
Let V be a real inner product space. Then for u, v, w € V andc € R, the following statements 
hold. 


u,v+w) = (u,v) + (u, w). 
= (cu, v) = cu, v). 
, 0) = (0, u) = 0. 
f (u, v) = (u, w) for allu € V, then v = w. 


Proof. Let V be areal inner product space, u, v, w € V andc € R. We use the three properties, given 
in Definition 7.1.1 of real inner product spaces to prove each statement. 


1. Using the symmetry of (-, -), we have 
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(u,v+w) = (v+u,u) 
= (v,u) + (w, u) 
= (u,v) + (u, w). 


2. Again, using the symmetry property, we have 


(u, cv) = (cv, u) 
( 
( 


= (cu, v). 


c(v, u) 
c(u, v) 


3. Let c = 0 in statement 2 of the theorem, then, using linearity in the first argument, we have 


(u, 0) = (0, u) = 0. 
4. Suppose (u, v) = (u, w) for allu € V. Then 
0 = (u, v) — (u, w) 


= (u,v) + (u, —w) 


= (u,v—w). 


Since the above is true for any uw, it must be true for vu = v — w. Therefore, (v — w, v — w) = 0, 
which implies that v — w = 0. Thus, v = w. 


Theorem 7.1.17 parts (1) and (2) tell us that inner products on real vector spaces are also linear in the 
second argument. Some of these properties are specific to real inner product spaces. Exercises 25—27 
consider the corresponding properties of complex inner product spaces. 

Next, we consider properties related to vector length. In particular, we will be satisfied to learn that 
vector length scales appropriately and the familiar triangle inequality holds. We also find that vector 
length is non-negative and is only zero when the vector itself is the zero vector of the vector space. 


Theorem 7.1.18 
Let V be areal inner product space. Then for u, v € V andc € R, the following statements hold. 


1. ||cul| = |cl|lull. 

2. ||u|| = 0 and ||uw|| = 0 if and only if u = 0. 

3. |(u, v)| < |u|] |v]. (Cauchy-Schwarz Inequality) 
4. |lu+ vl] < |lw|| + ||v||. (iriangle Inequality) 


Proof. Let V be areal inner product space, u, v € V andc € R. We use Definition 7.1.1 and properties 
of a real inner product space (Theorem 7.1.17) to prove each statement. 


1. First, we have 


1/2 
cul] = (cu, cu)'/? = (c2(u,u)) = lel(u, u)!/? = lelllall. 
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2. Now, we know that ||u|| = (u, u)!/?. Thus, the result follows because V is a real inner product 
space. 

3. If v = 0 then the result follows trivially. Otherwise, let ~ = (u, v)/(v, v). We will show that the 
inequality 0 < ||u — jLv||* leads to the desired relation. Using Theorem 7.1.17 and the linearity 
condition for inner products, we have 


0 < lu — poll? 
= (u — LV, u — Lv) 
= (u,u) — 2(u, Lv) + (WU, Lv) 
= (u,u) — 2u(u, v) + u7(v, v) 
(u,v)? (u, v)?{v, v) 


a a (v, v) (v, v)2 
_ (u, v)? 
=U (v, v) 
= uj? - 24 
el? 


Thus, ||u||||vl] > |(u, v) |. 
4. We now use the Cauchy-Schwarz inequality (Property (3) of this theorem) to establish the triangle 

inequality. 

llu+ ull =(ut+v,u+v)? 
= ((u,u) + (v, v) + 2(u, v))V? 
1/2 

(lla? + lvl? + 2(u, v)) 
< (lla? + wl? + 2uclioll) 
= |lull + [lvl 


Therefore, the triangle inequality holds. 


Some applications of the triangle inequality and the Cauchy-Schwarz inequality are considered in 
Exercise | and Exercise 15. 

Standard inner products, for vector spaces of interest in this text, are compiled in the following table 
(Table 7.1). 


7.1.4 Orthogonality 


When the inner product of two nonzero vectors has a value of zero, we understand that these two 
vectors are “maximally” linearly independent because the angle between them is 2/2. In R* or R*, we 
know that two such vectors are said to be perpendicular to each other. We can generalize this concept 
to other vector spaces. 


Definition 7.1.19 


Let V be an inner product space with inner product (-,-) : V x V > R. Given two vectors u, v € V, 
we say u and v are orthogonal if (u, v) = 0. 


7.1 Inner Products, Norms, and Coordinates 391 


Table 7.1 A table of standard inner products. 


Vector Space Standard Inner Product 
uy VI 
uz v2 

(R”, +, -) Letu=] . | andv=] |. | then 
Un Un 


n 
(i, Oy = u jv; (The dot product) 
j=l 


1 
(Pn(R), +, -) (p(x), q(x)) = [ P(x)q(x) dx 
(Minxn(R), +, -) Let A = (aij) and B = (bij) then 


m n 
WAN, 133) = Me SS aj; bj; (The Frobenius Inner Product) 
f=) j=! 
Me rassaen Let u be the image with pixel intensities u;; and 


v be the image with pixel intensities v;;, then 
m n 


(u, v) = ee 


i—lj—l 


This definition tells us that 0 € V is orthogonal to all vectors in V. The concept of orthogonality 
can also be extended to sets of more than two vectors. 


Definition 7.1.20 


We say that a set of nonzero vectors {v1, v2, ..., U,} is an orthogonal set if (v;, v;) = 0 whenever 
i # j. Ifan orthogonal set consists only of unit vectors, then we say that the set is an orthonormal 
set. 


The first part of Definition 7.1.20 says that the vectors are pairwise orthogonal. The second part 
of Definition 7.1.20 says the vectors all have unit length. Since orthogonal sets consist of pairwise 
linearly independent vectors, we might wonder whether orthogonal sets are linearly independent sets. 


Theorem 7.1.21 


Let B = {v,, v2, ..., v,} be an orthogonal set of vectors in an inner product space V. Then G is 
linearly independent. 


Proof. Let V be an inner product space. Suppose, also, that B = {v,, v2,..., vu, } an orthogonal set of 
vectors in V. If B = {vj} then it is linearly independent because v; 4 0. 

Now, suppose n > 2 and, by way of contradiction, that G is linearly dependent. That is, we assume 
without loss of generality that v, = ajv1 + a2v2 +... + Gn—1Un—1 for some scalars aj, a2, ..., An—1. 
Now, taking the inner product of both the left- and right-hand sides of this equation with v,, we find 


(Un, Un) = (ay vy + agv2 +... + Gn—1Up—1, Un) 
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A} (V1, Un) + 42(V2, Un) +... + Gn—1(Un—1, Un) 
aj0+a20+...+a,_10 
= 0. 


Thus, v, = 0, a contradiction. Therefore, G is linearly independent. 


In the proof of Theorem 7.1.21, there is a suggestion that if we have a linearly independent set 
we can add a vector, orthogonal to all others, to the set and maintain linear independence. The next 
corollary formalizes this observation. 


Corollary 7.1.22 
Let V be an inner product space, {u1, u2,..., ug} alinearly independent subset of V andw € V, 
w # 0. Then {u, v2, ..., uz, w} is linearly independent if (uj, w) = O for j = 1,2,...k. 


Proof. Let V be an inner product space, S = {u,,u2,..., ux} a linearly independent subset of V, 
weéeV,w €Oand (uj, w) = 0 for j = 1,2,...k. We need only show that w cannot be written as a 
linear combination of the vectors in S. 

Suppose, by way of contradiction, that w= aju; +a2u2+...+axug for some scalars 
a\,a2,...ag. Then the same technique used in the proof of Theorem 7.1.21 gives us 


(w, w) = (au + agua +... + agug, w) 
= aj (Uj, W) + a2 (U2, W) +... + (Ug, w) 
= 0. 


However, (w, w) = 0 only if w = 0, a contradiction. So, w cannot be written as a linear combination 
of uj, u2,..., Uz. Thus, {u1,u2,..., uz, w} is linearly independent. 


In the next corollary, we recognize that Theorem 7.1.21 leads us to a “maximally” linear independent 
basis for an inner product space. That is, a sufficiently large orthogonal set in an inner product space 
is a basis. We will see how to find such bases in the next section. 


Corollary 7.1.23 
Let V be an n-dimensional inner product space and let B = {v), v2,..., v,} be an orthogonal 
set of vectors in V. Then B is a basis for V. 


Proof. By Theorem 7.1.21, we have that 6 is a linearly independent set of 1 vectors in V. Thus, by 
Theorem 3.4.24, B is a basis for V. 


Example 7.1.24 Let 
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We can show that & is an orthogonal set in R3, but is not orthonormal. Indeed, 


1 
1] = V3. 
1 
But, 
—l 
,| 0 J=-1+041=0 
1 


-1 
(2 })=-142-1=0 


-1 -1 
O}],]) 2 J=140-1=0 


1 -1 


Therefore, B is an orthogonal set. Furthermore, Corollary 7.1.23 tells us that B is a basis for R?. 


Example 7.1.25 Let C([{0, 1]) be the set of continuous functions defined on [0, 1]. Define the inner 
product on C([0, 1]) as in Example 7.1.3. The set § = [V2 cos(nz2) |ne Nn] is an orthonormal set. 


We can show this by considering n, k € N,n Ak. Then 


1 
(/2cos(nrx), V/2.cos(krx)) = / 2 cos(nm x) cos(kwx) dx 
0 


1 
— / cos((n + k)mx) + cos((n — k)mx) dx 
0 
_ sin((n+k)z) — sin((n — k)rr) 
(n+ k)x (n —k)x 
= 0. 


So, S is an orthogonal set. Now, we show that the vectors in S are unit vectors. 
1 
| V2cosinxx)| =) 2cos*(nax) dx 
0 


1 
= i 1+ cos(2nzx) dx 
0 


sin(2nz ) 
2n1 


Thus, S is an orthonormal set. 


We can use the inner product to find vectors orthogonal to a given vector. This ability will be useful 
in the next section for constructing orthogonal bases. Here we give a few examples. 
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Example 7.1.26 Suppose u € R? is defined by u = (1 1 i) ; Using the standard inner product for 


R?, we see that, for an arbitrary v = (a b c)', 
(u,v) =a+b+e. 


To find v orthogonal to u, we set (u, v) to zero. If a = 1, b = 1, and c = —?, then v is orthogonal to 
u. There are infinitely many such vectors. 


Example 7.1.27 Suppose we have the vector p(x) = 1 +x +x? € P2(R). An orthogonal vector 
q(x)=a+bx+cx? must satisfy (p(x), q(x)) = 0. If we use the inner product of Example 7.1.3, we find 


1 
(p(x), g(x)) =f P(x)q(x) dx 


1 
= (1+x427) (a+ bx + cx?) dx 
0 


1 
1 (a+ @+b)x+@+b+o)x? +b +0)x3 + cx4) dx 
0 


1 1 1 1 
S = b ~(b = 
at satb)+ sat +) + 76 +o) + se 
13 47 


For p(x) and q(x) to be orthogonal, (p(x), q(x)) = 0. There are many such orthogonal pairs. For 
example, we can choose a = -ifh b =O and c = 1. That is, g(x) = —(47/110) + x? is orthogonal 
to p(x) = ltx4+x?. 


1 0 
Example 7.1.28 Find a vector in R? orthogonal to both u = | 1 | and v = | 2 ]. We seek a vector 
-1 1 
a 
w= |b | that satisfies (vu, w) = 0 and (v, w) = 0. These conditions lead to the system of linear 
c 
equations: 
(u,w) =a+b—c=0, 
(v,w) =2b+c=0. 
3 3 
This system of linear equations has solution set Span —1y]¢. If we let w = | —1 J, then by 
2 2 


Corollary 7.1.22, B = {u, v, w} is linearly independent. Also, because dim R? = 3, B is a basis for 
R?. 


In each of the above examples, we found the set of orthogonal vectors by solving a homogeneous 
system of equations. 
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7.1.5 Inner Product and Coordinates 


The inner product turns out to be very useful in expressing coordinate representations of vectors. 
Because the inner product measures a degree of linear dependence, scaled by vector lengths, we might 
expect it to quantify the unique basis decomposition built into our idea of coordinates. 

Consider an example in R*. Suppose we wish to find the coordinate vector of v = (2, 1) relative to the 


0 
of wu, and uz with coefficients a; and a2. That is, 


basis B = {us — (;) ,u2= i \. We know that v can be written as a unique linear combination 


v=ayu, + aUu2. 
Part 4 of Theorem 7.1.17 and the linearity of the inner product guarantees that 


(uy, V) = (Uy, Ayu, + aQU2) 
= ay (Uy, Uy) + a2{Uy, U2), 


and 


(u2, v) = ay(u2, ut) + a2(u2, u2). 
These two linear equations form the matrix equation 
(ui, U1) (u1,u2)\ far) ( (ui, v) 
(uz, U1) (u2,u2)) \a2 (u2, Vv) 
For the particular vectors of interest, we have 


eerie 


ll 
oS 
J ke 
LK oO 
KS 


The solution is 


The reader can verify that v = qu I- euD. 
We can readily generalize this approach. 


Theorem 7.1.29 
Suppose 6B = {w1, u2,..., Un} is a basis for n-dimensional inner product space V and v € V. 
The coordinate vector a = [v] is the solution to 


(uj, U1) (uy, U2)... (U1,Un)\ (ai (uy, Vv) 
(uz, U1) (U2, U2)... (U2,Un) | | a2 _ (u2, Vv) 
pee ees see ee Un) an ee v) 


The proof is directly analogous to the previous discussion. 
The following two corollaries show that coordinate computations are sometimes significantly 
simplified with orthogonal or orthonormal bases. 
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Corollary 7.1.30 
Suppose 6 = {u1, u2,..., Up} is an orthogonal basis for n-dimensional inner product space V 


and v € V. The coordinate vector a = [v]g has entries 


(uk, v) (Uk, Uv) 


= 7 = 5 itor (6 Da avesll 
I|uxl (Uk, Uk) 


ak 


Proof. Suppose B = {u,, u2, ..., un} is an orthogonal basis for n-dimensional inner product space V 
and v € V. Note that 
( ) 0 ifi #j 
uj, uj) = 
Ce | ill? ifi = j 


Thus, the matrix equation in Theorem 7.1.29 is 


lui? 0 ... 0 ay (1, v) 
0 |lu2|?... 0 az (u2, v) 
0 O.... Ilan ||? an (Un, V) 


That is, we find each coordinate ax by solving 


Ilex ||" ax = (uz, v) forall k=1,2,...,n. 


Therefore, 
(ux, V) 


ual? forall k= 1,2,...,n. 
Uk 


an = 


Corollary 7.1.31 
Suppose 6 = {u1, uz, ..., un} is an orthonormal basis for n-dimensional inner product space V 


and v € V. The coordinate vector a = [v]g has entries 


op. = Wie, OD), Wore Kos il, 2 ssagite 


Proof. Suppose B = {u1, u2,..., Un} is an orthonormal basis for n-dimensional inner product space 
V and v € V. By Corollary 7.1.30, fork = 1,2,...,n: 


(ux, Vv) 


ar = ——. 
|| wx ll? 


However, ||ux|| = 1 for all k = 1,2,...,n. Thus ay = (ug, v). 
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Example 7.1.32 Consider the basis B = {1, x, x7} for inner product space P)(R) with the standard 


inner product. We use Theorem 7.1.29 to verify that for p(x) = 2+3x + Ax, [ple = (2,3,4) € R?3. 
First, we compute the necessary inner products. 


1 
,1)= bene 


w= f xay 
1 
fone 
0 
tg 1 
sep dx =3 
0 
a 1 
P) =f Bax=} 
0 
Yd 1 
ae 5 
0 


(1,2 + 3x + 4x2) = 2(1, 1) +31, x) +4(1, x7) = 2 
(x, 2+ 3x + 4x7) = 2(x, 1) +3(x, x) + 4(x, x?) = 3 
( 


(x?, 24+ 3x +4x’) = 2(x7, 1) + 3(x?, x) + 4(x?, ge 8. 


The matrix equation is 


1 1/2 1/3 29/6 
1/21/3 1/4] [plp={ 3 
1/3 1/4 1/5 133/60 


with unique solution [p]g = (2 3 4)". 


Example 7.1.33 Consider the orthogonal (ordered) basis for R*: 
B= (0-49) 09-1) 56-11), 


Using Corollary 7.1.30, we find [v]g for v = (6 -l —8)'.Leta = [v]g and B = {b,, bo, b3} then we 
have 

— (bi,v) — 6+4+4+56 | 

“1 b,b1) 1416449 — 

_ 40950). O28. 

~ (bo,b2) 1 +441 —— 

_ (b3,v) — 18+1-8 | 

~ (b3,b3) OF 141 — 


Thus, 
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The reader should verify that 6 is indeed orthogonal and that v = by + 2b2 + b3. 


Example 7.1.34 The heat diffusion transformation in H3(R) in the standard basis, S, has representation 


1/2 1/4 0 
[Els = | 1/4 1/2 1/4 
0 1/41/2 


This matrix transformation has orthonormal eigenbasis 
ip WV fp 
B= {bi,b2,b3}= 4 | '2],[ 0 ].[ 1 
1 pp -1 /: J 1 1p 


We now find the heat state [h]g for [h]s = (1 2 3)". 
We seek coefficients [h]3 = a = (a a2 az)" so that [h]5 = a,b, + agb2 + a3b3. Using Corollary 7.1.31, 
we have 


'h 1 
a=( yp), 42 242 
Ih 3 
1/3 1 
an = 0 |, (: =-V2 
“2 3 
Ih 1 
n= “Wz ], | 2 \=2-v2 
Ih 3 
24/2. 
Thus, [A]g = —/2 |. The reader should verify that B is indeed an orthonormal eigenbasis for 
2-2 


[E]s and that [h]s = (2 + V2)bi + (—V2)b2 + (2 — V2)bs. 


= Path to New Applications 

Clustering and support vector machines are used to classify data. Classification is based on 
measuring similarity between data vectors. Inner products provide one method for measuring 
similarity. See Section 8.3.2 to read more about connections between linear algebra tools and 
these techniques along with other machine learning techniques. In Exercise 30, we ask you to 
begin exploring linear algebra techniques used for data classification. 


7.1.6 Exercises 


1. For each set of given vectors in R”, compute (u, v), ||u||, |lu|l, ||“ + v|| and show that both the 
Cauchy-Schwarz and Triangle Inequalities hold. Use the standard inner product on R”. 
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ie: aia 2004)" 
(f) a ee 


2. Let by € R* represent the annual revenues of business k over four consecutive fiscal years 2018— 
2021. The revenue vectors of five businesses are given here: 


by = (2345)', 
by =(3316)', 
by =(0055)', 
by = (2324)', 
by =(2344)' 


Which of the five businesses has a revenue vector, which is farthest from the vector of median 
values? Answer this question using two different vector norms using the standard inner product 
in R": 


(a) The Euclidean norm: ||x|| = ((x, x))!/?. 


(b) The weighted norm: ||x|| = ((x, Ax))!/? where A = 


oo Oa- 
Oo OA © 
of Oo CO 
Ni- O © © 


3. Consider the vector space M2,.3(R) with Frobenious inner product. Determine the subset of 
matrices M C M ,,3(R) containing all matrices orthogonal to both 


120 000 
rece, md B= (554) 


show that M is a subspace of M>,.3(R) and find a basis for M. 

4. Consider the three images of Example 7.1.6. Suppose each image was normalized (scaled to 
unit length using the standard image inner product). Describe any differences in the grayscale 
representation of these new images. 

5. Consider the vector space of continuous functions f : [1,2] — R and inner product (f, g) = 
if ig f(x)g(x)dx. Argue that the vector space norm || f|| = (f, f)!/7 is an intuitive choice for 
measuring vector length. Use this norm to find a function g € ?;(R) closest in norm to f(x) = 
1/x, on the interval [1, 2]. 

6. Using the ideas in Exercise 5, find the polynomial f(x) € ?2(R), which most closely approximates 
the following functions on the given intervals. Plot the functions and discuss your results. 


(a) g(x) =sinx on (0, zr]. 
(b) h(x) = /x on [0, 1]. 

(c) r(x) = |x| on [—1, 1]. 
(d) t(x) = 1/x? on[1, 2]. 
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Ti 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


17. 
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A scientist recorded five biochemical properties of the seeds of several varieties of pumpkin. 
Later it was determined that the pumpkin fruit that was the most successful, in terms of size and 
consumer acceptance, came from the seed variety with data vector 


u = (6 1341710)’. 
Before the following growing season, the same biochemical signatures were obtained for three 


new varieties: 


x1 = (8 104 16 10)", 


x) = (10 174 136)", 
x3 = (7115 159)". 


Describe and justify a reasonable (linear algebra) method for determining which of the three new 
varieties is most likely to be successful. Test your method. 


. Prove or disprove the following. Claim: Suppose (-,-); and (-,-)2 are both inner products on 


vector space V. Then, (-, -) = (-,-)1 + (-, -)2 is an inner product on V. 


. This exercise shows that a vector space can have many different inner products and vector norms. 


Consider the vector space IR? and define (x, y) = ax, y, + x22, where a > 0 is a scalar. Show 
that (-,-) : IR? x R? —> R is an inner product. 
Let (-,-) : Po(R) x P2(R) — R be defined by 


1 
(Pi, p2) = : Pip2 dx, for pi, p2 € P2(R). 
0 


For what values of c is the vector p(x) = x? + x +c a unit vector? 
Let (-, -) : P2(R) x P2(R) > R be defined by 


1 
(r(x), pao) = f pi(x)p5(x) dx, for pi, pr € P2(R). 


Is (pi (x), p2(x)) an inner product on P2(R)? 

Consider the function f : R? x IR? — R defined by f(x, y) = |x1y1| + |xzy2| + [x33]. Is f an 
inner product on R?? 

Consider the standard inner product (-,-): R” x R” — R. For what scalars p is (-,-)?: R” x 
IR” — R also an inner product? 

Show that the function f(x, y) defined in Example 7.1.11 does not satisfy the inner product 
linearity condition f(u+v,w) = f(u, w) + f(v, w) and is therefore not an inner product on 
D(Z2). 

Show that ||u — v||? = |u|? + llvlI?7 — 2(u, v). Relate this result to the law of cosines in R?. 
Consider an orthogonal set of vectors U = {u;, u2,--- , u,} in vector space V with inner product 
(-,-). Suppose v € Span U. Write ||v|| in terms of ||ux||,k = 1,2,--- , 7. 

Consider the given functions f: R? x R* — R. For each, show that f is an inner product on R? 
and sketch the set of normal vectors. 


(a) f(x, y) =4x1y1 + 4x25. 


7.1 


18. 


19. 
20. 
21. 


22 
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(b) f@, y) = x1y1 + 9x2y2. 
(c) f(x, y) = 2x1 y1 + x1y2 + x29 + 2x22. 


Let £(x) be the Euclidean length of any vector x € R*, and f(x, y) be the cosine of the angle 
between any two vectors x, y € R?. Prove that the function (-, -) : R? x R? > R, defined by 
(x, y) = £(x)€(y) f (x, y), is linear in the first argument. 

Prove Theorem 7.1.16. 

Provide a direct proof of Theorem 7.1.21. 

Consider the vector space of 7-bar LCD digits, D(Z2). Let uz be the value of the k‘ ” bar for 
u € D(Zz). Determine whether or not each of the given f : D(Z2) x D(Z2) — Zp is an inner 
product on D(Z2). 


(a) flu, v) = Yn 


=1 
(b) fu, v) = max (ue) 


; ifu=Oorv=0 
1 otherwise 


(c) fonv)=| 


Find the coordinate vector [v]g for the vector v in inner product space V for the given basis 

B. Use the standard inner product and methods of Section 7.1.5. Some of the given bases are 

orthogonal, others are not. 
(a) V=R2,0 = (33), 
(b) V =R?,v =(33)', 
(c) V=R?, v= (12-1) 
@) V=R3,v=(12-1)',B= 
(e:) V=R4,v = (1234) 

19 =¥3 0) ,(0 20 =¥2) .(4010) .(o20%2). 

(f) V = P(R), v=x+3, B={1+x,2—-x, x7}. 

(g) V = PR), v =x +3, B= {1, 2x — 1, 6x? — 6x + 1}. 

(h) V = P)(R), v =x +3, B= {1, V12x — V3, V180x? — /180x + V5}. 

(i) V = F(R), v = sin(rx), B = {V2cosnzx Ine NI. (See Example 7.1.25.) 


11) ‘. 

(1 ne iy 
= {(110)',(012)', (101). 
(1 10)", (-110)', (001)"}. 


The following exercises explore inner product spaces over the field of complex numbers C. 


23: 


24. 


20. 
26. 


Let u, v € C”. Show that in order to preserve the concept of vector length ||u|| = /(u, u), the 
standard inner product should be defined as 


(u, V) = Uyv) +u2v2 +... +Undn, 
where x indicates the complex conjugate of x. 


Using the standard inner product of Exercise 23, show that the symmetry property no longer 
holds in Definition 7.1.1. Show that 


(u, v) = {v, u) 


holds for both real and complex inner product spaces. 
Generalize Theorem 7.1.16 for complex inner product spaces and provide a proof. 
Generalize Theorem 7.1.17 for complex inner product spaces and provide a proof. 
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27. 
28. 


29. 


30. 
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Show that the zero vector is the only vector orthogonal to nonzero vector z = a+ bi € C!. 


1+i 


Find the set X of all vectors orthogonal to € 7 ) € C*. Show that X is a one-dimensional 


subspace of C? by finding a basis for X. 

Given any nonzero vector v in R”, prove that its orthogonal complement v+ is an (n — 1)- 
dimensional subspace. 

(Data Classification.) Any (m — 1)-dimensional subspace (or more generally, any (n — 1)- 
dimensional hyperplane? ) divides R” into two regions. (See Exercise 11 of Section 3.5.) This fact 
is very useful in data classification: if we are interested in developing a classifier for points, we 
look for subspaces that separate data points with different characteristics. The following exercises 
point us in this direction. 


(a) Graph the following two sets of points in R*: 


A ={G, 4), (5, 1), (—3, —3), (-6, —4)}, 


B = {(8, —3), (0, —4), (1, —6), (—2, —6)}. 


(b) Which of the following subspaces separate the sets in the previous part? Sketch the subspaces 
on your graph from part (a). 
i. Span{(0, 1)"} 
ii. Span{(1, 1)"} 
iii. Span{(6, 7)} 
iv. Span{(—1, 1)} 
v. Span{(3, 5) 

(c) If the sets in part (a) are representative of two populations, which subspace from part (b) 
would be the best distinguisher between the populations? Why? 

(d) Reconsider the subspace that you selected in part (c). Can you shift the subspaces by a 
nonzero vector to get a hyperplane that better separates the populations? What shift(s) 
would be best? 

(e) Now let’s use the language of inner products to make your observations more formal. For 
the subspace from part (c) and the hyperplane from part (d), do the following. Compute the 
distance from each point in set A to the subspace/hyperplane and take the minimum. Do 
the same for distances from points in set B to the subspace/hyperplane. What do you notice 
about the shifted hyperplane you selected in part (d) compared to the original subspace? 

(f) Based on your responses, make a conjecture about what properties you would like a 
separating hyperplane to have in order for it to separate two classes of data vectors. These 
ideas lead to the development of a data analytics tool called a support vector machine; see 
Section 8.3.2 for more information. 


Projections 


In Section 7.1, we discussed a notion of how close two vectors are to being linearly dependent. The inner 
product of two vectors provides a scalar value indicative of how close two vectors are to being scalar 
multiples of one another. Nonzero orthogonal vectors have no components that are scalar multiples of 


3 A k-dimensional hyperplane is a translation, possibly by zero, of a k-dimensional subspace 
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one another. In this section, we want to take this further. We would like to understand (a) how close a 
vector is to a given subspace and (b) how to express a vector as a sum of two vectors: one in a subspace 
and the other orthogonal to the subspace. Consider several examples: 


1. Suppose a subspace of brain images is known to be associated with a particular condition. We 
might wish to compare a given image with this subspace of images. Is our image in the subspace? 
If not, is it close? 

2. Suppose the heat stress on a rod in a welding apparatus is closely linked to a particular subspace 
of heat states. If a heat state has a large component in this subspace, it may be indicative of high 
stress, a smaller component may be indicative of lower stress. 

3. Suppose we wish to know if an object in an imaging system has a large nullspace component, so 
that we can better understand our imaging system. We may wish to have some idea of how much 
of our object information can be recovered from an inverse (object reconstruction) process. 

4. Suppose we have two brain images that have been reconstructed from two different radiographs. 
We may wish to know if these two images differ by something significant or by something of little 
importance. Again, this can be measured using a known subspace of significant (or insignificant) 
features. 

5. Suppose we have a polynomial function that describes a production strategy over time. We may 
also know a subspace of functions that are known to have high marketing value. How close is our 
strategy to this subspace of desirable strategies? 


The key idea in this section is the partitioning of vector spaces into direct sums of subspaces that 
are of particular interest in an application, and the description of vectors in terms of these subspaces. 
Along the way, we will need to understand the bases for these subspaces. We will also understand that 
distances are best measured in terms of orthogonal subspaces and orthogonal bases, so we will need 
to understand how to construct them. 

Let us consider a visual example in R*. Suppose v € R? and W is a two-dimensional subspace 
of R?. We want to find u € R? and w € W so that v =u + w and u is orthogonal to every vector 
in W. Geometrically, we see that W is a plane in R? that passes through the origin, u is orthogonal 
(perpendicular) to the plane, and w lies entirely within the plane. See Figure 7.2. We say that w is the 
closest vector to v within W. We can say that || w|| /||v|| is a measure of how close v is to being within W. 

Next, consider the example of digital images in Z4,.4(R). In Section 2.1, we found that 


Fig. 7.2 The vector v € R? is decomposed into a vector u, orthogonal to all vectors in W—a subspace of dimension 
2—and a vector w € W. 
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E Tax4(R), but 


Image 4 
™ mene I an 
Image A Image B Image C 


Is there an image in the subspace spanned by Images A, B, and C that is “closest,” to Image 4? If so, 
we will call this image, the projection of image v onto the subspace spanned by Images A, B, and C. 
While Image 4 cannot be written as a linear combination of Images A, B, and C, it may be that the 
projection, which can be written as such a linear combination, is close to Image 4. 

In this section, we will formally define projections as functions from a vector space into a subspace 
of the vector space satisfying certain properties* . First, we will define coordinate projections, which 
don’t require any notion of orthogonality or inner product. Then we will define a subclass of coordinate 
projections called orthogonal projections. We will discuss how orthogonal projections can be used 
to find the vector closest to another vector in a subspace. Finally, since orthogonal subspaces and 
orthogonal bases are needed for computing these projections, we will describe a technique for finding 
an orthogonal basis called the Gram-Schmidt process. 


7.2.1 Coordinate Projection 


Our first attempt at defining a projection mapping is based on coordinates. Consider the case of R? with 
the standard basis for motivation. Suppose we want a projection from the whole space to the x y-plane. 
It would be natural to map any set of vectors to its “shadow” in the xy-plane, as shown in Figure 7.3. 
Algebraically, this map is just the map T that sets the z coordinate to zero: 


x x 
T: RoR, Tly]:= fy 
z 0 


We can extend this idea to arbitrary vector spaces using bases and coordinates. Suppose we have a 
vector space V with basis 8 = {u1, u2,..., Un}. We know the basis determines coordinates for V, so 
we can project onto the span of any subset of these coordinate vectors by setting the other coordinates 
to zero. More specifically, we can separate (arbitrarily) the basis into two sets, say, {u,,..., ux} and 
{Uk+1,-.-, Un}. Then by properties of a basis, any vector v € V is uniquely expressed as v = v1 + v2 


4 In the larger mathematical context, a projection is any function from a set into a subset that maps any point in the subset 
to itself, but the projections we will introduce satisfy additional properties related to linear algebra. 
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re 


x 


Fig.7.3 Projection of v € R> onto the xy-plane results in the “shadow” seen in gray. 


where v; € W; = Span{u1,..., ug} and v2 € Wo = Span{{uz41,...,Un}}. Then V = W; ® Wo, and 
we can define a coordinate projection onto W; (or W2) in the following way. 


Definition 7.2.1 


Let V be an inner product space over R and W, and W) subspaces of V with bases 6; = {u1,..., ux} 
and B2 = {uz41,...,Un}, respectively, such that V = W; © Wp. Let v =ayu, + aguz+...4+ 
Anun. Then, vy = aju, +...+ aux is said to be the coordinate projection of v onto W;, 
V2 = ak4{UR+1| +... + GpUy Is said to be the coordinate projection of v onto W2 and we write 
V1 = Cprojy, v and v2 = cprojy, v. 


The idea of coordinate projections generalizes naturally to an arbitrary number of subspaces 
Wi, W2,..., Wj,..., Wm, provided that V = W, ©... ® W». Each coordinate projection v; depends 
upon all subspaces. That is, we cannot compute any v; without knowing the entire set of basis vectors 
and all coordinate values. The following three examples illustrate this. 


Example 7.2.2 Consider the inner product space R* with basis 


s-ino-{()-G)} 
msl ()} wens [(@)] 


Consider v = (1, 4)'. We find the coordinate projections of v onto W). 


Let 
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W, 


Fig.7.4 Graphical depiction of Example 7.2.2, v = 2u) + u2, 8o cprojy, v = 211. 


First, note that R? = W; @ W2 and we can write v as the linear combination 


-()20)-@) 
n=ersin =(2) 


Figure 7.4 graphically shows the coordinate projection onto W}. Exercise 2 asks you to calculate the 
coordinate projection onto W2. 


Thus, 


Example 7.2.3 Consider Example 7.2.2, replacing the second basis vector by uz = (1) so that 


s-mai-{().()]) 
rs ()] eves) 


Now we write v as a linear combination using the new basis 


(20) 


Then 
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Fig.7.5 Graphical depiction of Example 7.2.3 v = 3u, — 2u2, so cprojy, v = 31. 


Hence, 


: 3 
UV] = Cprojy,v = (¢) ; 


Figure 7.5 graphically shows the coordinate projection onto W;. Exercise 2 asks you to calculate the 
coordinate projection onto W>. 


Example 7.2.4 Consider once again Example 7.2.2, this time replacing the second basis vector by 


u2 = ea so that 
p=wisd={(2)-(in)} 
1 1 
Wi= span |(3)] and W2 = Span (ea). 


With some behind-the-scenes trigonometry (try it yourself!) we write v as 


1=14() 04) 


Then 
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Wi 


U1 


“Ws 


Fig.7.6 Graphical depiction of Example 7.2.4, v = 1.8u; — 0.8u2, so cprojy,v = 1.811. 


Hence, 


ree pe: 
vy = cp Iw v= 3.6 : 


Figure 7.6 graphically shows the coordinate projection onto W,. Exercise 2 asks you to calculate the 
coordinate projection onto W>. 


In each of these three examples, the subspace W, the basis for Wj, and the vector v were the same, 
but with different choices of the basis for W2 we found different coordinate projections onto W,. Our 
desire for measures of closeness and similarity will guide us in making advantageous basis choices. 
One particularly useful aspect of our definition of a coordinate projection is that it can be applied to 
vector spaces other than R”: 

Example 7.2.5 Consider the inner product space P2(R) with basis 


B= {uj =14+x,u2. =x,u3 =14+x7}. 


Let W; = Span{u1, u2} and W2 = Span{u3}. Consider an arbitrary polynomial p(x) = a+ bx + 
cx? € P)(R). Then, p(x) is uniquely written as 


p(x) = (a—c)u, + (—a+ b+ c)uz + (c)us. 


So, we have 
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Pix) = cprojy, p(x) = (a—c)ur + (-a+b+c)u2=a—ct+bx 
and 


p2(x) = cprojy, p(x) = cuz = c+ a 


Notice that p(x) = p(x) + p2(x) for any a,b,c ER. 


Example 7.2.6 Consider the vector space of 7-bar LCD characters, D(Z2) with basis 


B ; ; ,_—_ ~ mh | 
Sat : => 


A 


If we consider D(Z2) = W; ® W2 where 


: Ae “i oS = ae D>. ; 


W, = Span A ae 


Then we have d = d, + dz, where 


d, =cprojy,d = -@™@:, and dz =cprojy,d = 'S 1 . 


7.2.2. Orthogonal Projection 


Coordinate projections are useful for describing vector space decomposition into subspaces whose 
direct sum is the parent vector space. However, we may not know, or may not be interested in, all of 
the possible subspace decompositions. For example, suppose we have identified a subspace of brain 
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images that indicate a certain brain condition. Understanding how close a brain image is to be in this 
subspace might give us information about whether the brain has the condition. 

Suppose By = {u1, u2,..., ug} and X = Span By, is a subspace of brain images for which a certain 
medical condition is indicated. Given a brain image v, we want to know how close v is to being in 
X. One approach would be to use a coordinate projection v = v; + v2 where v; is in the subspace 
X and then measure ||v;||/||v|] as an indication of closeness. However, we have seen (for instance, in 
Examples 7.2.2, 7.2.3, and 7.2.4) that vy depends upon our choice of basis for describing the remainder 
of the vector space not in X. Notice that among Examples 7.2.2, 7.2.3, and 7.2.4, the orthogonal basis in 
Example 7.2.4 yields the shortest vector v2 (in Figures 7.4, 7.5, and 7.6). This leads us to conclude that 
the most natural choice of remaining basis vectors By = {ux+1,..., Un} is the one whose elements are 
all orthogonal to the elements of B;. That is, we wish v2 to measure how much of the vector v is not in X. 

Toward this end, we define the orthogonal complement of a set of vectors, which we will use to 
identify appropriate bases for projection. 


Definition 7.2.7 
Let V be a vector space and W C V. The orthogonal complement of W, denoted W+ and read 


“W perp”, is the set 
Wt :={v EV | (v, y) =0 forall y € W}. 


In words, W+ is just the set of vectors orthogonal to all vectors in W. 


Example 7.2.8 Consider y = (1, —2, 3)’ € R®*. We find {y}+ by finding all vectors v € R? such that 
(v, y) = 0. So, we have 


a 1 a 
{y}t = cw |{ 2), [0 }=0 
3 Cc 
a 
= lp eR |a-2b+3c=0 
Cc 
2 3 
—~J{1)o+ | 0 ceR'|b,ceR 
0 1 
3 
= Span 1}],] 0 . 
1 


We see y+ in Figure 7.7. 


Example 7.2.9 Consider the set 


w=f11],[ 3 CR. 
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> V2 


Fig.7.7 The orthogonal complement, y+ = —2 of Example 7.2.8 is the plane perpendicular to y and is spanned 


We find W+ by finding all vectors v € R? such that both 


1 
1 »)=0 
1 


and 


Then we have 


1 
cw |{ 1], 


1 


a 
—e 
| 

— 
a 
lwo 
es 
ae 8 
se 
ll 

° 


oa Fa 


ER |at+b+ce=0, sp-6 0 


4 
1 blbeR 
3 


ll | 
N,N, 
ao Fa 


—4 
= Span 1 
3 
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He 
1 0 
Fig.7.8 The orthogonal complement, Wt = | (' , ( 3 | is the line perpendicular to both vectors in W. (See 
1 


Example 7.2.9). 


We see W+ in Figure 7.8. 


Example 7.2.10 Consider R? as a subspace of itself. (R*)+ is the set of all vectors in R? which are 
orthogonal to every vector in R*. The only vector satisfying this very restrictive requirement is the 
zero vector. That is, (R*)+ = {0}. 


The orthogonal complements in the examples above are all written as a span. Therefore, each is a 
subspace. In fact, it is always the case that the orthogonal complement is a subspace whether or not 
the set W is a subspace. We see this result in the next lemma. 


Lemma 7.2.11 
Let V be an inner product space and W C V. Then W* is a subspace of V. 


Proof. Suppose V is a vector space and W C V. We will show that (a) the zero vector of V, Oy, is in 
W- and (b) for every u,v € W+ and scalar a,au+veWe. 


(a) First, notice that by Theorem 7.1.17, (Ov, y) = 0 for all y e W, so Oy € wl. 
(b) Now, let u,v € W- and @ be a scalar. Then, for all yew, 


(au+v,y)=a(u,y)+(v,y) =a-0+0=0. 


Thus, au +ve WH. 
Therefore, by Theorem 2.5.3 and Corollary 2.5.6, W+ is a subspace of V. 


If, in addition, W is a subspace, then we create a natural decomposition of V as the following direct 
sum. 
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Theorem 7.2.12 
Let W be a subspace of the inner product space V. Then V = W @ Wt. 


Proof. Let V be an inner product space and let W C V be a subspace. we need to show that if 
v € WOW1, then v must necessarily be the zero vector of V. Then, we need to show that v can be 
uniquely written as v = vy + v2 where vy; € W and v2 € W-. We leave the details of this argument to 
Exercise 12. 


Corollary 7.2.13 
Let V be an inner product space and W C V. Then 


V = Span(W) ® wt. 


Proof. Suppose V is an inner product space and W C V. Span(W) is a subspace of V and the result 
follows directly from Theorem 7.2.12 and Exercise 16. 


We now have a natural decomposition of a vector space into the direct product of a subspace W 
and the orthogonal complement subspace W+. Every vector v € V can be written v = w +n where 
w € Wandn € W~. Furthermore, (w,n) = 0. 

Now we can use r = ||w]||7/||v||? as a measure of how close v is to being in W. If v € W then w = v 
and n = 0. In this case, r = 1. If v € W+ then w = 0 andn = v. In this case, r = 0. In fact, because 
w and n are always orthogonal, 


_ iw? wir dw iP 


ter 2 2 2 _ 
al |w + n| \|w||* + |l7l| 


Definition 7.2.14 


Let W be a subspace of inner product space V. Let v € V, then there are unique vectors w ¢ W 
and n € W~ so that v = w +n. We say that w is the orthogonal projection, or equivalently, the 
projection of v onto W and write w = projy(v). 


In Definition 7.2.14, we say that w € W and n € W+ are unique. Suppose v can be decomposed 
into v = wy +n, and v = w2 +n. Then 


( 

= ((w) — w2) + (21 — 12), (W1 — w2) + (21 — 22)) 
(wy — W2, Wy — Wz) + 2(wy — w2, nN] — N2) + (ny — Nz, Nn, —N2) 
( 
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> U2 


w 
= by ‘Os; 

dy ( v) 

Ty 


Fig. 7.9 Orthogonal projection onto a vector (or a line). 


Now, since (w 1 — w2, Wy — w2) and (nm; — n2,n1 — nz) are nonnegative, they must both be zero. 
Therefore, w; = w2 and n; = no. This shows the uniqueness assumed in Definition 7.2.14. 

For clarity in notation, it is common to write v, instead of n, to indicate the component of v 
orthogonal to W, and vj instead of w, to indicate the component of v that is within the space W. 

Both orthogonal projections and coordinate projections result from decomposition of the inner 
product space V into a direct sum W; © W3. The projections agree if and only if W2 = Wi. For 
example, see Exercise 18. 

Next, suppose we have a vector v € R* and we want to project v onto a vector y in R”. We are 
looking for a vector on the line made of all scalar multiples of y. That is, we project onto the subspace 


L={ay:aeR}. 


Then we are looking for the vector w = proj, (v) shown in Figure 7.9. We now consider projecting a 
vector onto any other vector along the same line. 


Projection onto a Line 


We begin with W = Span{u1}, the span of only one vector. This means, we want the projection of v 
onto a line (also depicted in Figure 7.9). First, we recognize that we are looking for w € W so that 
v = w-+n where n € WL. Then notice, w € W means w = au, for some scalar a. Next, n € wt 
says that (n, u;) = 0. Putting these together with v = w + n, we see that 


(v, U1) = (au, +n, uy) 


uy, Uy) + (n, uy) 


Thus, 


Vv, Uy Vv, Uy 
a= Mei and, therefore, w = au; = SUD 


(u1, Ut) (u}, U4) 


Let us, now, consider two examples. 
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Example 7.2.15 Consider IR? with the standard inner product. Let W = Span{(2, 2, ihe 8 We can find 
the orthogonal projection of v = (1,3, —1) € R? onto W. 


(3 .=1),0,9 1% 
((2, 2, 1)", (2, 2, 1)") 


ll 
\olIn 
= 
Sey 
ll 
— 
ar 
i 
Sirs 


projwv = ut 


Example 7.2.16 Find the orthogonal projection of 
f(x) =2-—x+x7 € P)(R) 


onto W = Span{1 — x}. We will use the standard inner product on P2(R). 
Og 437, 1x) 
(l—x,1-— x) 
fp Q—x+3?)(1 —x)dx 
a (1 — x)?dx 


projy f(x) = (1 — x) 


(l—-x) 


Who 
= crag — x) 
= #1 -x) 


In these examples, we found the scalar multiple of a vector that is closest to another given vector. 
We now consider projection onto a subspace. That is, we find a vector in a subspace that is closest to 
a given vector. 


Projection onto a Subspace 


The scenario is illustrated in Figure 7.10 for a two-dimensional subspace. Suppose we wish to project 
a vector v onto a k-dimensional subspace W with basis 6; = {u1, u2,..., ug}. We know that every 
vector in W~ is orthogonal to every vector in W. Thus, given a basis for we, Bo = {up41,..-, Un}, 
we know that (u;, uj) = 0 for all u; € B; and uj; € Bo. 


423 


Ty 


Fig.7.10 Orthogonal projection onto a subspace (in this case, a plane). 
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Now, the projection is given by the coordinates (a), a2,..., a) = [v]g,. From Theorem 7.1.29, 
we have 

(uy, U1)... (U1, Uk) a (u,v) 
: . : 0 : 

(Uk, U1)... (Uk, Uk) a |_| (uk, v) 

(Uk+1,Uk+1) «++ (Uk+1,Un) | | akq1 (Uk+1, U) 

0 

(un, Uk+1) tee (Un, Un) an (Un, v) 


Notice that this matrix equation can be separated into two matrix equations because of its block matrix 


structure. That is, we can find a), a2,..., ax by solving the matrix equation 
(u1, U1)... (1, Uk)\ (ar (uy, v) 
: , oS & : (7.5) 
(Uk, U1)... (uk, Uk) J) \ag (ux, V) 
And, we can find ag41, @¢42,..-, Gn by solving the matrix equation 
(Uk+1,Uk41) «-. (Uk41,Un)\ fak+1 (Uk+1, V) 
(Un, Uk+1) +». (Un, Un) an (Un, V) 


Notice that the solution to Equation 7.5 gives us the projection onto the subspace spanned by the vectors 
inB 1- 


Example 7.2.17 In this example, consider v = (1, 0, 3)' € R?. We will find the orthogonal projection 
of v onto the subspace 


2 3 
W = Span 1},/-l],]0 
0 1 1 
A basis for W is 
1 3 
By =tu,y={1],uw.=]0 
0 1 


If we write proj yv = au, + a2U2, then we have the system of equations 


Ce ) & = ) 
(uz, U1) (u2,u2)) \a2 (uz, v))° 


Computing the inner products gives 
23\/a\_/1 
310) \a2) ~~ \6)° 


The matrix equation has solution a = (—8/11, 9/11)". So, 
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3 fu 
projyv = ajuy +aQu2 = -* 1} + a Oo} =] —§/1 
0 1 fit 
To find the projection of a vector v onto any subspace W = Span{u1, u2,..., ug}, we make use of 


the fact that W and W~ are orthogonal sets. We can solve for either proj w(v) Or proj y+ (v), whichever 
is simpler, and then use the fact that v = projy(v) + projy1(v) to solve for the other. 

In Example 7.2.17, we could have, instead, found projyv by first projecting v onto W+ and then 
subtracting from v. To see this, we first compute W+, 


wt =u € R3 | (u, uy) = Oand (u, uw) = o| 
a 
= b)|a+b=0,3a+c=0 
c 
1 
= -lla|laeR 
-3 
1 
Let w’ = | —1 J, then we know that 
—3 
1 
: . (v, w’) —8 
PrOjy (VY) = Projyy(v) = 7 w! = Ty ie 
Now find proj yw (v) as described above. 
1 _8 1 9/1 
projw(v) = v — projwi(v) = | 0] — Ti —-1)=]-81 
3 —3 fi 


which is the same as our result in Example 7.2.17. 


Example 7.2.18 Consider the three images of Example 7.1.6 (also shown below). Suppose images 
y and z are two known examples of traffic density distributions in the inner city that correlate with 
significant increase in evening commute times. We wish to understand if another density distribution, 
say x, exhibits similar density properties. To answer this question, we begin by computing the projection 
of x onto the subspace W = Span{y, z}. 


with numerical grayscale entries 
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95 | 89 | 82 92 94| 6 | 14] 27 102/ 22 | 30 | 23 
23 | 76 | 44 | 74 92 | 35 | 20 20 109} 45 | 21 | 29 
Ter as]e2)is|’ ” | 41] si)o0] 2 |’ ~ | 50/85/33 | 45 
49 | 2 | 79 | 41 39 | 1 | 60 | 75 97 | 14 | 67 | 83 


y) ey a) _ (3) 
(z,y) (z,2z) J \a2 (Zh J 


Computing the inner products (see Example 7.1.6) gives 
46,079 51,881) fai\ _ ( 39,913 
51,881 59,527} \a.) ~~ \ 497,974 


with solution (a1, a2)’ © (—2.203, +2.726)'. So, x = x) +x1, where 


71) 47) 51] 3 24 | 42 | 31 | 89 

94 | 46 | 13 | 35 —71} 30 | 31 | 39 
L|| = proj ax & , £1. = projyi(x) & 

46 | 53 | 46 | 36 15 |—7] 16 |—-18 

68 | 36 | 50 | 61 —19)—34)} 29 |—20 


We have ||projy-|| * 198 and |x, || © 121. We also have viv! ~ 0.853. So we see that x 
has a non-trivial projection onto Span{y, z}, but perhaps not significant from a traffic congestion 


standpoint. 


Matrix Representations of Orthogonal Projection Transformations 


Orthogonal projection of a vector v € R” onto any subspace W is determined by the solution to 
Equation 7.5. We need only have a basis, 6 = {uw 1, u2,..., ux}, for the subspace and an inner product 
(-,-). Let U be the matrix whose columns are the basis vectors. Notice that 


— uy — | | (uy, Uy) (Uy, Ux) 
uUU= uy... ue | = 
—u—j)\l | (ug, U1)... (Uk, Uk) 
and 
==). [i] (uy, Vv) 
Uv= vj = 
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Combining this with Equation 7.5, we see that the projection coordinates, aj, a2,..., ax satisfy 
U'Ua = U'v. The following theorem shows how one can obtain a projection vector as a series of 
left matrix multiplication operations, a method that is especially useful for computational tasks on a 
computer. 


Theorem 7.2.19 
Let W be a k-dimensional subspace of IR” and v € R”. Let {w,, u2,..., uz} be a basis for W 
and U the matrix whose columns are v1, u2,..., uz. Then 


projwv = u(ulu)—!U'v. 


Proof. Let W be a subspace of R” and let v € R”. Define B = {uy, u2,..., uz} to be a basis for W. 
Suppose U is the matrix whose columns are uj, u2,..., Uz. Then Rank(U) = k. Therefore, Row(U) = 
R*. Now, by the Rank-Nullity Theorem (Theorem 5.1.30), we know that Nullity(U) =n —k. Let 
{Uk+1, Uk+2,---,;Un} be a basis for Null(U). By Theorem 5.2.23, we know that the transformation, 
T : R‘ — R’, defined by left multiplication by U is one-to-one and S : R” —> R* defined by left 
multiplication by U' is onto. We now show that So T : Rk — R* is onto and therefore invertible. 
Notice that 
Ran(S) = Col(U') = Row(U) = R*. 


That is, if v € R*, then there is w € R” so that S(w) = u and 
W= Aut +... + OguR + ApsiUugsi +... + AnUn. 
Therefore, S(w) = u, for WwW = a,u, +... + aug € Col(U). Notice, also, 
Ran(7) = Col(U). 
Therefore, there is a7 € R* so that T (a) = w. That is, 
(So T)(4) = S(T(@)) = SW) =u 
and S o T is onto. That is, Ran(S o T) = R*. Thus, S o T is invertible. Therefore, by Theorems 4.4.14 
and 5.2.21, U'U is invertible. We know that U'Ua=U'v, so a=(U'U)~'!U'v. Now 


[projwvlg =a, so 
projyv = Ua = U(U'U)"!U"v. 


Consider again the matrix representation of the projection transformation. Suppose we want to find 
an orthogonal projection of a vector v € R” onto a k-dimensional subspace W. Notice that if we have 
an orthogonal basis B = {u,u2,..., ux} for W, then 
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(ui, U1) w+. (Uy, UK) lui] O ... O 
iy (uz, U1) (U2, U2) (U2, Uk) 0 |luall 0 
(ux, Uy) wes (Ux, UK) O  ... O |lugll 


Therefore, since u ; is not the zero vector for any 1 < j < k, we have 


I/llui) O ... 0 
eed 0 I/lluall 0 
(UU) = . . 
0 0 T/luell 


The next two corollaries tell us how the projection transformation is simplified when the subspace 
basis is orthogonal or orthonormal. 


Corollary 7.2.20 

Let W be a k-dimensional subspace of IR” and v € R”. Let {u1, u2,..., ug} be an orthogonal 
basis for W and U the matrix whose columns are u), U2, ..., uz. Then projyv = U Uv, where 
U is the n x k matrix whose columns are the normalized basis vectors. 


The proof is the subject of Exercise 13. 


Corollary 7.2.21 
Let W be a k-dimensional subspace of IR” and v € R”. Let {u1, v2, ..., ug} be an orthonormal 
basis for W and U the matrix whose columns are uv, U2, ..., uz. Then projyv = UU'v. 


Proof. Suppose W is a subspace of R” and v € R”, {u1, u2,..., ux} is an orthonormal basis for W 
and U is the matrix whose columns are uw, v2, ..., uz. Then, the result follows from Corollary 7.2.20 
by noting that each basis vector is a unit vector. 


Example 7.2.22 Let v = (1, 2,3)' € R? and define 


1 
W=Span;,]0O],] 1 
1 


We can find proj v using Theorem 7.2.19. Note that B is a linearly independent set, but not orthogonal, 


so we define 
12 


U= 01 
12 
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Then 
24 _ 9/2 —2 
Try T 1 
uty = (75) and UTD) ae a 
So, we have 
“4 1 101 
u(uTv) uT= = (020 
101 
Therefore, 
101 1 1 4 2, 
Projwv = 5 020 2 =F 4]/=]2 
101 3 4 2 
Example 7.2.23 Now, let v = (1, 2,3)' € R? and 
1 2 
W=Span;,]0O],] 1 
1 —2 


We can find projyv using Corollary 7.2.20 because the basis for W is an orthogonal basis. We define 


12 
uU=|01 
13 
So, 
; 1/V2 2/3 
U=| 0 1/3 
1//2 —2/3 
Then, 
2 GT 
projyv = UU'v 
W/v2 2/3 1//2 0 1/v2 : 
Sls eee 2/3 1/3 —2/3 
1//2 —2/3 3 
i, (14 
= =o) 
22 


If, in Example 7.2.23, we had the orthonormal basis 


1/2 2/3 
B= 0 |].1 13 ]}. 
1/J/2) \-2/3 
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Corollary 7.2.21, tells us that we can directly use U as its columns are already normalized. 


7.2.3. Gram-Schmidt Process 


We have seen that orthonormal bases for vector spaces are useful in quantifying how close a vector 
is to being within a subspace of interest. We have also seen that projections, which help us quantify 
“closeness,” are most easily computed and understood in terms of orthonormal bases. In this section, 
we explore the Gram-Schmidt process for constructing an orthonormal basis from an arbitrary basis. 

First, consider the following series of pictures showing how this process works in R*. Suppose we 
have a basis C = {v1, v2, v3} (Figure 7.11) and we would like an orthonormal basis B = {u1, u2, u3}, 
with one vector pointing in the direction of v;. We find an orthogonal basis and then normalize the 
vectors later. 

We begin building the orthogonal basis by starting with the set B = {v,}. This set will get larger 
as we add orthogonal vectors to it. Notice that, currently, B is an orthogonal (and, therefore, linearly 
independent) set (Figure 7.12), but, 6 is not a basis for IR3. We rename v; as v1 to make clear that we 
are creating a new basis. 

We use v2 to obtain a vector orthogonal to wu; by subtracting, from v2, the component of v2 that is 
parallel to w;. (Recall, the vector we subtract is proj oe v2.) When we do this, all that will be left is the 
component of v2 orthogonal to v1. (Figure 7.13) 


Fig.7.11 The set C = {v1, v2, v3} is a basis for R3. 


Fig.7.12 Step 1: Set uy = vy. 
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vy Uy 


Fig. 7.13 Step 2: Set uz equal to the component of v2 that is orthogonal to u; (recall, vectors make more sense drawn 
with tail at the origin). 


xy 
Fig.7.14 The vectors uw; and u2 of the orthogonal basis B. 
We label this left over vector uz and include it in the set 6. So now 
B= {u1, u2} 
is an orthogonal set, but still not a basis for R*. We can see the current status in Figure 7.14. 


To find u3, we repeat the process with v3, but removing the components parallel to each of uw; and 
uz as in Figures 7.15 and 7.16. That is, we find 


U3 = V3 — proj, V3 — Proj, U3- 
We put u3 into 6 to obtain the orthogonal basis 
B= {uj, uz, u3} 


for IR? see in Figure 7.17. 
Now, to normalized each, we divide by the respective length. The resulting set 


ut u2 U3 
‘ 7 ; , 
luall [lwo teal 


is an orthonormal basis for R?. We now give an example. 
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r3 


wD) 


(v3 

\9 
x0" 
Uy pyr 


Fig.7.15 We subtract proj,,, (v3) from v3 to get a vector orthogonal to w2. 


x3 


XQ 


. 
ie 


—proj,,, (v3 


(we 


c\wt 
<0) 


, 
vy xy X41 
> 


Fig.7.16 Step 3: We subtract proj,,, (v3) from v3 — proj,,, (v3) to get a vector orthogonal to both wz and uw. Set v3 to 
be this orthogonal vector. 


XY 


Fig.7.17 The vectors u1, v2 and w3 of the orthogonal basis B. 
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Example 7.2.24 Given a basis 


1 4 —1 
y=]2),u2=|]4),%3=] 7 
—2 3 2 


for R3, find an orthonormal basis containing one vector in the same direction as v;. As above, we keep 
v1 and call it uy: 


Next, we find uz using the process described above with v2. That is, 


_ a (v2, U1) 
U2 =V2 — proj, (v2) = v2 — 
(u1, U1) 
ye ie eh 
aay Ng 1 


Finally, we find u3 as suggested above. 


U3 =V3 — proj, (V3) — proj,,, (V3) 


_ (3, U1) ; (v3, U2) 
(u1, U1) (u2, U2) 
-l 1 2 
=(7}-5[2]-3[¢ 
2 2 1 
—2 
=] 5 
4 


Therefore, an orthonormal basis with one vector in the direction of vq is 


| uy uz 3 

Jer ||” eit? lal 

1/3 2sV5\ (—2/V45 
2/3 |,{ O |, 5/745 
yg 1//5 4//45 


The next theorem formalizes this procedure, called the Gram-Schmidt process. 
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Theorem 7.2.25 [Gram-Schmidt] 
Let V be an inner product space with basis {v1, v2, ..., Un}. Set 


uj = v1 


and 


1 (uj ) uj) 
Then {u1, U2, ..., Uy} is an orthogonal basis for V. 
Proof (by induction). Suppose V is an inner product space with basis {v1, v2, ..., Un}. Suppose, also, 


that 


k—-1 
Uj, Uk 
ieee ai, for k = 2,3,...,n. 
ja ie Es) 


Define B = {u,, u2,..., Uy}. We will show that B is an orthogonal set, and so by Corollary 7.1.23, B 
is an orthogonal basis for V. 


Let By = {uj,u2,..., uz}. Notice that 6; = {u;} has only one element and is, therefore, an 
orthogonal set. Now, suppose Bx is an orthogonal set. We show that By+1 is also an orthogonal 
set. Let Wx be the set spanned by B;. Notice that 


Wy = Span{u}, u1,..., ux} 


=> Span{v}, Dia sgeceisieg ux}. 


Since uz41 ¢ Wz, we know that uz., 4 0. Also, by properties of (-, -) and the assumption that 5; is 
orthogonal, we have, for 1 < £ < k, that 


k 
(uj, Uk+1) 
(Uk+1, Ve) = { Vk41 — > gage UE 


= (uj,Uj) 


k 
= (vet, ve) — a IEEE ve) 


j=l (uj, uj) 
(Ue, Uk+1) 
= (Ug41, ve) — (ug, Ve) 
(ug, ue) 
= 0. 
Therefore, uz41 is orthogonal to each vector uj, u2,..., uz. Thus, 6,41 is an orthogonal set. By the 


principle of mathematical induction, B is orthogonal. 
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Corollary 7.2.26 


Let {v1, v2, ..., U,} be a basis for R”. Define 
uj, = VI 
and 
UR = UE — Up ves fOteki— 3 eae 
where U; is the matrix with columns 1, u2,...,uj. Then {u1,u2,..., Un} is an orthogonal 
basis for V. 


The proof is the subject of Exercise 14. 


Example 7.2.27 Let T : R* — R? be the linear transformation defined by left multiplication by matrix 


(9.9 
M=|-2 4 -4 
2-4 4 


We will find an orthonormal basis for Null(7) = Null(/). We have 


Null(M) = {u = (a,b,c) €R? | Mu = o| 


a 
={u=|b]| €R’|a—2b4+2c=0 
c 
—2 2 
={u=| 1 ]b+]0}ceR |b,ceR 
0 1 
—2 2 
= Span 1 ],]0 ; 
1 
—2 2 
so that a basis for Null(M) is 1 |, |0] ¢. We begin by formulating an orthogonal basis. We 
0 1 


use the first vector, uj = (—2, 1, 0)", as the first vector in the orthogonal basis. The second vector is 
found using the Gram-Schmidt Procedure. 
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me ey 
i) > \o 

1 2 

=% 4 

5 

So, an orthogonal basis is 

—2 2 
1],14 
0 5 


Therefore, after scaling each vector to be a unit vector, we have the orthonormal basis 


1 {-7\ 1 77 


=) Dt i, la 
V5 \ 9) 3V5\5 


We conclude with two examples from the inner product space of polynomials on [0, 1]. 


Example 7.2.28 We can find an orthogonal basis for P2([0, 1]) by the Gram-Schmidt process using 
the standard basis {vj = 1, v2 = x, v3 = x"); 


uj=vuy=l. 


1 
1 d 
uo = vp — ee My = y Jo D@)dx : 


(u1, U1) Jy (dx 


= 2 


(ui, U3) (u2, U3) 
(uy, U1) (u2, U2) 


sgt sds Oe. slo Ba) ay 


uz = V3 


7 Jo Max Le=) & 
=x°-4-4(2x-1 
=x°=x + i 


Thus, we have the orthogonal basis B = {1, 2x — 1, 6x2 — 6x + if, where each vector was scaled to 
eliminate fractions. We can check orthogonality by computing the pairwise inner products. 


1 1 
(20-1) = f (2x — 1)dx = (?-x)| =o. 
0 
! 1 
(1,6? - 6x +1) = f (6x* — 6x + Idx = Qx3—3x7 +H] <0. 
0 


1 
(2x — 1, 6x2 — 6x + 1) = : (12x3 — 18x? + 8x — Dax 
0 
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= (3x4 — 6x3 + 4x? -x)|, = 0. 


Example 7.2.29 We can find an orthonormal basis for P2({0, 1]) from the orthogonal basis found in 
Example 7.2.28: B = {uy = 1,u2 = 2x —- 1,43 = 6x2 — 6x + 1}. We only need to normalize each 
basis vector. Let vj = aju, where a, € R and ||v;|| = 1. That is, vy is the normalized basis vector 
which is a scalar multiple of w1. 


1 1/2 
1 = lil = aia =a, (/ (yar) = 
0 
So, a; = 1. Similarly, 


1 1/2 
2 a2 
1 = ||v2|| = a2||u2|| = a2 (/ (2x — 1) ax) ae. 
0 


So, a2 = J3. ‘ 12 ” 
i=healelee (/ (6:2 — 6x + dx) se), 
0 V/5 


So, 43 = /5. We have the orthonormal basis 


{1 /3(2x — 1), V5(6x2 — 6x + p} 


‘= Path to New Applications 

Researchers use Fourier analysis to find patterns in data and images. This method uses an 
orthonormal basis of sinusoidal functions to rewrite data vectors, using the Fourier transform, 
as a linear combination of these functions. The coefficients in the linear combination allow 
researchers to view data vectors as coordinate vectors in the Fourier space (the corresponding 
coordinate space). See Section 8.3.2 to read more about connections between linear algebra 
techniques and applications. 


7.2.4 Exercises 


1. In each of the following exercises, the set B is a basis of the vector space V, the subspace X 
is the span of some of the vectors in B, and w is a vector in V. Find the coordinate projection 


cprojy w. 


wr -4-[()-G)] =e x-9[()}--() 
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1 0 1 0 1 
(bo) V=R*, B=] 1], 1]} CR’, X =Span}]0],/1]},w= |] 0 
0 i 1 1 i 0 
(c) V=P)(R), B={l,l+x,1+x+x7}C F(R), X =Span{l,l+x+x7}, w= 
2—3x —x?. 


2. (a) Calculate the coordinate projection of cprojw,v of v onto W2 from Example 7.2.2. Repeat 
for Example 7.2.3 and Example 7.2.4. 
(b) Verify that v = cprojw,v + cprojy,v in all three cases, even though in all three cases the 
coordinate projections were different. 
3. For each of the following exercises, find the orthogonal projection proj yw. 


(a) X = R3, w = (-1,2,0)' € R’. 

(b) X = Span{(1, 0, 1)", (0, 2, 0)"} C R3, w = C1, 2, 3)' € R?. 

(c) X = Span{1 + x”, x — 3} C P)(R), w =2+3x 4 4x? © Pp(R). 

(d) X = Span{1, x, x7, x7} C C({0, z]), w = sinx € C({0, 2]), where C({0, z]) is the vector 
space of continuous functions on [0, wz] and (f(x), g(x)) = Ia Sa)g(x)dx. 


(e) 
X = Span 7. “7 rt C T4,.4(R) 
Image A Image B mage 
_ € Zy,4(R). 
Image 4 
(f) 
=> syste a 


Use the inner product (u, v) = 1 ifu and v have at least one lit bar in common, and (u, v) = 0 
otherwise. 


4. Let B = {u1, u2} be an orthogonal basis of R2, and let W = Span {u,}. For v € R2, what is the 
difference between the coordinate projection cproj w (v) and the orthogonal projection proj yw (v)? 
[Hint: what must be true about uw, in order to have cprojw(v) = projy(v)?] 
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5. Consider the inner product space V and subspace W = {0}. Describe the sets V+ and W+. 
6. For each subspace, find orthonormal bases for W and for W+. 


=] 0 1 
(a) W = Span 1],/-1],/-1}} ¢cR 
0 0 1 
a 
(b) W= b|\a+tb+c=0} CR’. 
Cc 
a 
b 4 
(c) W= f a+2c=07 CR’. 
d 


(d) W= {a+ bx+cx? |a=b} C P2(R). 


7. Let 1, be the vector in R” whose entries are all 1, and let L be the subspace of all linear 
combinations of 1,. Show that for any vector uv in R”, proj, u = u1, where u is the mean value 
of the entries in w. 

8. Given the basis B = {1, x, x, x3} for P3(R), and the inner product (note the limits of integration) 


1 
(pit), p22) = / Pic) patids. 


Find an orthonormal basis for P3(IR) which includes a scalar multiple of the first basis vector in 
B. Compare your result with the first four Legendre Polynomials. 
9. Find the projections of the vector x = (1, 2, 0, 1) € R* onto each of the eigenspaces of the matrix 
operator 
—4112 
—9216 
0010 
—3111 


M= 


10. Formulate an inner product on P;(R) in which the standard basis {1, x} is orthonormal. 

11. Formulate an inner product on P2(R) in which the standard basis {1, x, x7} is orthonormal. 

12. Prove Theorem 7.2.12. 

13. Prove Corollary 7.2.20. 

14. Prove Corollary 7.2.26. 

15. Let V be an inner product space and W C V be a subspace. Prove that dim W + dim W+ = 
dim V. 

16. Let V be an inner product space and suppose that W C V. Prove that (Span W) = W+. In 
other words, adding linear combinations of vectors in W to W does not change the orthogonal 
complement. 

17. Let V be an inner product space and W C V be a subspace. Prove that (W+)+ = W. 


18. Let V=R?,W= {(;3 


2 
Compare with Example 7.2.4. 

19. Let V be an inner product space and W; C W2 C V. What can you say about wi and wi? 
Explain. 

20. Prove that if the columns of n x k matrix M form a linearly independent set, then rank(M M"') = 
k. 

21. Modify the Gram-Schmidt process to find an orthogonal basis directly from a finite spanning set. 


)}. and v= ({) Compute the orthogonal projection of v onto W. 
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22. Consider R* with the basis B = {(;) ; (5) and let W = Span (i) \. For any v € R’, 


denote v1 = cprojy (v). In this exercise, we will explore the quantity || v1 || /||v||. In the introduction, 
we claimed that this quantity gives a measure of closeness of a vector to the subspace W. This 
exercise will demonstrate why geometrically it is preferable to use orthogonal projections rather 
than coordinate projections for this purpose. 


(a) Set the basis vector (5) — (1). 


i. Sketch the subspaces W = Span (1) and X = Span {(3) on the same axes. 


ii. To develop some intuition for the next part of this problem, calculate || v1 ||/||v|| for 


v= (ants =(*) 


iii. Find all vectors in R? with the property that ||v;||/||v|] = 1. 
(b) Repeat part (a) with the basis vector ; = : I 
(c) Based on your observations, does the quantity ||vj||/||v|| give a better measurement of 
closeness of v to W in part (a) or in part (b)? Explain. 


Exercises 23 through 32 further explore projections as transformations. Let V be an n-dimensional 
inner product space, W a subspace of V, X = W+ and v € V. Let Py : V > V be the transformation 
defined by Pwv = projyv. 


23. Show that P is linear. 

24. Show that Poa = Pyov. 

25. Show that || Pwv|| < |lv||. 

26. Show that v = Pwu + Pxv. 

27. Show that Null(Pw) = W+. 

28. Show that if dim W > 0 then A = 1 is an eigenvalue of Py. 

29. Show that if dim W <n then A = 0 is an eigenvalue of Py. 

30. Show that A = 0 and A = 1 are the only eigenvalues of Py. 

31. Show that Py is diagonalizable. 

32. Show that if dim W <n then Py is neither injective nor surjective. 


7.3. Orthogonal Transformations 


In our study of the heat diffusion transformation, we found that diagonalizing the matrix representation 
was a significant simplifying step in analyzing long-term behavior. Recall that, because the diffusion 
matrix EF is diagonalizable, we were able to, for some initial heat state ho, write the heat state k time 
steps later as 

hy = EXho = QD‘ Q"'ho, 


where Q is the matrix of whose columns are eigenvectors of FE and D is the diagonal matrix of the 
corresponding eigenvalues. The computation was much simpler because computing D* consists only 
of raising the diagonal elements (eigenvalues) to the k‘” power, resulting in far fewer computations 
than those involving matrix multiplication. 

We also saw that the eigenvalues and eigenspaces of the transformation were much easier to visualize 
and interpret than the linear combinations that made up particular heat states. This approach led to a 
clearer understanding of the time evolution of heat signatures. We also noted that the diffusion matrix, 
Equation 4.3, 
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1-25 64 0 0 
6 1-26 6 
0 5 
b= : (7.6) 
5 0 
6 1-285 6 
0 wes 0 6 1-26 


(where 0 <6 < i is symmetric. This symmetry reflects the fact that heat flows in all directions 
equally (in our one-dimensional case, both left and right) scaled only by the magnitude of the heat 
gradient. In Exercise 18 of Section 6.2, we found that the normalized eigenvectors of FE are 


= 
up = wat (sin Zh. sin Jak, +), sin mat) , k=1,2,---,m (7.7) 
and the corresponding eigenvalues are 
k 
Ag = 1 = 25 (1 — cos AK). (7.8) 
We know that / = {u1, u2,---+ , Um} is an eigenbasis for R”. However, using topics from Sections 7.1 


and 7.2, we can now show that U/ is an orthonormal basis for R” (see Exercise 19). 

In this section, we will focus on the relationship between the diagonalizability of a matrix and the 
symmetry of the corresponding transformation. We begin by examining the properties of matrices that 
have orthonormal columns. 


7.3.1 Orthogonal Matrices 


In Section 7.2, we employed orthogonal projections to create orthogonal and orthonormal bases using 
the Gram-Schmidt process. Now, suppose we have such a basis U/, and define Q to be the coordinate 
transformation matrix from the orthonormal basis U/ to the standard basis S. That is, O[x]zy = [x]s 
for any x € R". 

Notice that if U = {uj,u2,...un}, then for 1<k <n, Q[ug]y = uz. But, [ugly = ex € R". 
Therefore, the columns of Q must be the orthonormal basis vectors, v1, v2, ..., Ux (written in standard 
coordinates). Next, we give a name to matrices whose columns are made up of orthonormal basis 
elements. 


Definition 7.3.1 


The x n matrix Q with columns uy, U2, ..., Uy, is said to be orthogonal if U/ = {uj, u2,..., Un} 
is an orthonormal basis for R”. 


The definition of an orthogonal matrix requires that the columns form an orthonormal basis, not 
just an orthogonal basis. For orthogonal matrix Q, we can write 
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1 if j=k 
QO=] uj, u2...Un], where (uj,ug) = ie epee 
| | | 
Lemma 7.3.2 
Let Q be ann x n orthogonal matrix. Then a'O = Ihe 
Proof. Suppose Q is ann x n orthogonal matrix with columns w1, u2,--- , 4). We compute the jk- 


entry of Q'Q to be 
(OO) jn =(ujm)= fr VIS 
SE NG. ag 2 eS 


Thus, O'O = In. 


Orthogonal matrices have several interesting properties with important interpretations as coordinate 
transformations. Recall that u;, the i’ ” column of Q, represents the image of e;, the i’ ” standard basis 
vector. Hence, as a transformation, Tg : R” — R", defined by Tg(v) = Qu, maps the orthonormal 


set {@),€2..., @,} to the orthonormal set {u1, u2,..., U,} as follows 


ej Puy, 


a2 u2, 


€n b> Un - 


Since Tg preserves lengths of basis vectors and angles between them, it is reasonable to expect that it 
will preserve lengths of all vectors and angles between any two vectors. We formalize this result along 
with other algebraic properties in the following theorem. 


Theorem 7.3.3 
Let Q be ann x n real orthogonal matrix, x, y arbitrary vectors in IR” and 0, y the angle between 
x and y. Then 


Qg'=Q' 
Q' is orthogonal, 
|| Qx|| = ||x|| (Q preserves vector lengths), 


(Qx, Oy) = (x,y), 


COS 99x, Qy = COS Ox,y (Q preserves angles between vectors), and 
If d is a (real) eigenvalue of Q, then A = +1. 


PS ig £2 


Proof. Suppose Q is ann x n real orthogonal matrix and 


7.3. Orthogonal Transformations 


xX] YI 
x2 y2 ‘ 
x= sy=].]eER 
Xn Yn 
We write 
| | | 
Q = ]Uuiu2...Uun], 
| | | 
where {u1, “2, ..., Un} forms an orthonormal set in R”. 
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1. By Lemma7.3.2, Q'Q = I,. Thus, by the invertible matrix theorem (Theorem 5.2.20), Q' = Q7!. 


2. The proof of part 2 above is the subject of Exercise 16. 
3. Using the result of part (1) and Theorem 7.1.5, we have 


4. Again, using the result of part (1) and Theorem 7.1.5, we have 


(Ox, Oy) = (Qx)'(Qy) 


5. Using the results of parts (3) and (4), we have 


(Qx, Qy) 
| Qx|I|] Qy Il 
(x, y) 
IIx III yl 
= cos by, y. 


COs #0x,0y = 


6. Suppose v is an eigenvector of Q with eigenvalue 4. Then, using the result of part (2), ||v|| = 


| Ovl| = |JAvl| = |Al|[ul|. Thus |A| = 1. Asa © R,A = 41. 
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This theorem not only shows that vector norms and angles between vectors are preserved under 
left multiplication by Q but also we get the very nice result that the inverse of an orthogonal matrix is 
simply its transpose. This has very useful implications for the heat diffusion scenario. Since we found 


the diagonalization of the heat diffusion matrix as E = QD Q7' and we found Q to be an orthogonal 
matrix we apply Theorem 7.3.3 part 1 to write 


E=0ODQ7'!=oOD@a". 


Example 7.3.4 Consider the orthonormal basis for R? 
= Pee ee _ (-V¥3/2 
vata (Ba) ( ia I] 
O= 1/2 —V/3/2 
AaB 72" 


The reader should verify that Q in indeed orthogonal and that Q' = Q~!. For arbitrary vector x = 


a 2 
(ex 


and the orthogonal matrix 


out =|( 7h, ee (.)| 


1 (a — V3b 2) 1/2 
ee b’) = Ixll. 
| 5 (“a + aIL=( | = (« ic Ie 


Consider also the arbitrary vector y = (‘) € R?. We have 


d 
(Qx, Oy) = (Ox)"(Qy) 

= ; (a — J3b J3a + b) (a8) 

- ; [(2 2 V3b) (c = V3d) a (3a eo b) (Wc +d)| 


[4ac + 4bd] 


Ble 


=ac+hbd 
= (x,y). 


The matrix Q has no real eigenvalues. Indeed, if we choose as starting vector u; and employ the 
Factor Method for finding eigenvalues, we find 


oon oi { (4) (8)-G) 
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We know that this set is linearly dependent because it has three vectors in R*. In fact, we have the 
linear dependence relation 


0= Q7u1+ Qui +m, =(9?+04+Du. 


Notice that if we try to factor 207 + Q + 1, we find that it does not factor into something of the 
form (aQ + bI)(cQ +d) for any real numbers a, b, c, and d. Therefore, Q does not have any real 
eigenvalues. In fact, we have 


cee ees) cal caes) 


This means that there are two complex eigenvalues 1 = 5 = 3i . Notice that this situation does not 


violate the eigenvalue property stated in Theorem 7.3.3. However, the absence of two real eigenvalues 
does mean that Q is not diagonalizable. 

As a coordinate transformation, [x]y = Q'x, we observe that left multiplication by Q' performs 
a rotation about the coordinate origin in R* by an angle of —/3. The reader should check this by 
plotting a vector in R? and the result after multiplying by Q. 


Example 7.3.5 In this example, we construct a family of 2 x 2 orthogonal matrices Mg for which left 
multiplication by Mg of a vector in R? results in a counterclockwise rotation of that vector through an 
angle 0. We build the transformation matrix column-by-column by transforming a set of basis vectors. 
Let Ty : R? — R? be the transformation defined by T(x) = Mox. If we work in the standard basis 


for R2, we have 
1 0 cos@ — sind 
i= (x (3) 1 ()) ~~ ee cos 0 ) 


Ifo= 3 then Mg is the matrix Q from Example 7.3.4. Notice that if v = (1, 2)", we have 


zi _ (cost —sints\ (1\ _— ('2 —v3p\ (1) _ (ip-v3 
73M =\ ina costs Pla} vp tp lot lige) 


In Figure 7.18, we see that T transforms v by rotating it counterclockwise by an angle of 7. 


Example 7.3.6 Consider the orthonormal basis for R? 
foe Perey ge: _ (¥3/2 
#= n= (Jap) e= (“iad 


Q= ( 1/2 se 
 \4/3/2 -1/2) ° 


and the orthogonal matrix 


The reader should verify the properties of Theorem 7.3.3. Even though Q looks quite similar to the 
matrix in Example 7.3.4, we will see that this orthogonal matrix does not represent a rotation. The 
eigenvalues of Q are A; = | and Az = —1. The eigenspaces associated with each eigenvalue are 
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Fig.7.18 T,,/3 transforms v by rotating it by } about the origin. 


a= fa (9) |aen|, 
= {e(Z) |wen 


Notice that €; and €> are orthogonal subspaces and R* = €; @ €. So, every v € R? can be uniquely 
expressed as v = vj + v2 where v1 € Ej and v2 € €. Notice also that Qv = Qv; + Qv2 = Yj — v2. 
That is, Q represents a reflection transformation about €;. Exercise 5 requests that the reader creates 
a visualization of this result. 


Example 7.3.7 Wecanconstructa family of 2 x 2 orthogonal matrices M,, for which left multiplication 
of M,, by a vector in IR? results in a reflection of that vector about the line L, = Span{u} = {au|a € R}. 
Let u be the unit vector u = (a, b)'. Then, we have (see Exercise 15) 


ge 
a~—b* 2ab 
ae ( 2ab b*— 2) 


Next, we explore the diagonalizability of rotation transformations in R?. 


Example 7.3.8 Consider the family of rotation matrices for vectors in R? 
cos 6 — sin@ 
oes (ae cos 8 ) : 
By considering when Mg — AI is not invertible, we can see that any eigenvalues 1 of Mg are zeros 
of C(A) = (cos@ — A)* + sin? @. So, we have 4 = cos 6 + cos? @ — 1, with real-valued solutions 
when 6 =kzr,k € Z, of} =coskn = (-1)*. The eigenspaces are of dimension 2. However, if @ is 


not an integer multiple of 7, then real eigenvalues do not exist. Thus, by the results of Section 6.2.5, 
Mg is diagonalizable, if and only if, 6 = kz, for some k € Z. 
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7.3.2 Orthogonal Diagonalization 


We have considered transformation matrices whose columns form an orthonormal basis of R”. In this 
section, we explore matrices whose eigenvectors form an orthonormal basis. Transformations that are 
diagonalizable with a set of orthogonal eigenvectors have special properties and deserve a special 
name. 


Definition 7.3.9 


We say that matrix M is orthogonally diagonalizable if 


M=QDQ', 


where Q is orthogonal and D is diagonal. 


Orthogonally diagonalizable transformations are decomposed into three successive transformations 
M=0QD Q': an orthogonal coordinate transformation (Q"), followed by a coordinate scaling (D), 
and ending with the inverse coordinate transformation (Q). The heat diffusion transformation is an 
example of an orthogonally diagonalizable transformation. We will explore this more in Section 7.3.4. 

For now, because the heat diffusion transformation is also symmetric, we want to answer a key 
question: What is the relationship between symmetric matrices and orthogonally diagonalizable 
transformations? It turns out that they are the same. In fact, a necessary and sufficient condition? 
for a matrix to be orthogonally diagonalizable is that it be symmetric. We will prove this beautiful and 
surprising fact, known as the Spectral Theorem, using the following two lemmas. 


Lemma 7.3.10 
Let M be areal n x n symmetric matrix and u, v € R”. Then (Mu, v) = (u, Mv). 


Proof. Suppose M is a real symmetric matrix and let u, v € R”. Then, M' = M. So, we have 
(Mu, v) = (Mu)'v = ul Mv = u'Mv = (u, Md). 


In fact, the converse of this theorem is also true: Given a real-valued n x n matrix M, if for all u, v 
in R”, (Mu, v) = (u, Mv), then M is symmetric. The proof is the subject of Exercise 21. 


Lemma 7.3.11 
Let M be arealn x n symmetric matrix. Then M has n real eigenvalues, not necessarily distinct. 


Proof. Suppose M is arealn x n symmetric matrix. Suppose v is a nonzero vector such that Mv = dv 
for some, possibly complex, 4. By Theorem 6.2.18, M has n, possibly complex and not necessarily 


5 A mathematician will use the words “necessary and sufficient condition” for a result to indicate that the condition is 
equivalent to the result. That is, the result is true if and only if the condition is true. Condition A is called necessary for 
result R if R implies A. Condition A is called sufficient for result R if A implies R. 
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distinct, eigenvalues. We need only show that all n eigenvalues are real. Using the result of Exercise 24 
of Section 7.1, we have 


| Mull? = (Mv, Mv) = (Mv) Mv = 0M Mv = M9 = oy = 2 II. 


So, A? = ||Mov||7/||v||? € R and A? > 0. Thus, A € R. That is, M has n real eigenvalues. 


Theorem 7.3.12 [Spectral Theorem] 
A real matrix M is orthogonally diagonalizable if and only if M is symmetric. 


Proof (=). Let M be a real matrix that is orthogonally diagonalizable. Then, there exists orthogonal 
matrix Q and diagonal matrix D so that M = QDQ'. Then, 


M' =(QDQ')' =(Q')'D'Q' = QDQ' = M. 


Therefore M is symmetric. 

(<=) Now, let M be a real n x n symmetric matrix. We will show, by induction on n, that M is 
orthogonally diagonalizable. First observe that every 1 x | matrix M is orthogonally diagonalizable. 
It remains to show that every real n x n symmetric matrix is orthogonally diagonalizable if every real 
(n — 1) x (n — 1) symmetric matrix is orthogonally diagonalizable. 

Suppose every real (n — 1) x (mn — 1) symmetric matrix is orthogonally diagonalizable. Let M be 
areal n x n symmetric matrix. By Lemma 7.3.11, M has n real eigenvalues. Let A be an eigenvalue 
of M with corresponding unit eigenvector v. Let W = Span{v}. Suppose w € W+, then, using 
Lemma 7.3.10, 

(v, Mw) = (Mv, w) = (Av, w) = Alv, wv). 


So, (v, Mw) =0 whenever (v, w) = 0, and Mw € W- whenever w € W+. Let u,w € Wt and 
consider transformation T : Wt — W+ defined by T(u) = Mu for allu € we. 


(u, T(w)) = (u, Mw) = (Mu, w) = (T(u), w). 


As a transformation from W+ to W+, we can fix a basis for W+ and represent T by a real (n — 1) x 
(n — 1) dimensional matrix. By Problem 21, this matrix is symmetric, so by the induction hypothesis it 
is orthogonally diagonalizable. So, there exists an orthonormal basis 8 of eigenvectors for W+. Thus, 
an orthonormal basis of eigenvectors of M for R” is B U {v}. 


The Spectral Theorem 7.3.12 tells us that symmetry is a necessary and sufficient condition for a 
real matrix to be orthogonally diagonalizable. In particular, we have the following two facts. 


1. If a real matrix M is symmetric, then M is orthogonally diagonalizable. That is, symmetry is a 
sufficient condition for a real matrix to be orthogonally diagonalizable. 

2. If a real matrix M is orthogonally diagonalizable, then M is symmetric. That is, symmetry is a 
necessary condition for a real matrix to be orthogonally diagonalizable. 
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Corollary 7.3.13 
Suppose M is arealn x n symmetric matrix. Then, M has k distinct eigenvalues with associated 
orthogonal eigenspaces €;,..., € for which R” = €; ®... ® Ex. 


The proof is the subject of Exercise 17. 


Example 7.3.14 Consider the symmetric matrix 


We can show that M has n = 3 real eigenvalues and is orthogonally diagonalizable. The distinct 
eigenvalues of M are A; = 0 and Az = —3. Because A; = 0, €; = Null(M). A basis for €; is {v; = 
(1, 1, 1)"}. A basis for Ez is {v2 = (—1, 1, 0)', v3 = (—1, 0, 1)"}. Notice that v; is orthogonal to both 
v2 and v3 as expected by Corollary 7.3.13. However, v2 and v3 are not orthogonal. We use the Gram- 
Schmidt process to find an orthonormal basis for R? from the eigenvectors of M. We find the orthogonal 
eigenbasis {(1, 1, 1)', (—1, 1, 0)", (—1, —1, 2)"}. Notice that the first vector is in €; and the second 
and third vectors are in €2. Normalizing each vector in the basis yields an orthonormal basis, which 
we express as columns in the (orthogonal) matrix Q: 


wl, Slsl 

v3 V2 V6 

—~/{/tLtte-Ht 

O- 1B AV 

to 2 

V3 V6 

With diagonal matrix 

00 0 
D=)]0-3 0], 

00 —3 


we have that M is orthogonally diagonalizable as M = QDQ". 


Example 7.3.15 Consider the transformation T : P2(IR) > P2(R) defined by T(f (x)) = i (xf (x)). 
Let B = {x2 + 1, x? — 1, x}, a basis for P2 (IR). We have 


T (Bi) = 3x7 +1=26,+ fo, 
T (Bo) = 3x? —-1= B, + 2fo, 
T (63) =x = £3, 


21% 
M =[T]gh=|120 
001 


442 7 Inner Product Spaces and Pseudo-Invertibility 


The transformation is symmetric, so there exists an orthogonal diagonalization of M. For this example, 
M = QDQ' where Q is the orthogonal matrix 


1 149 
fifa 
—_ fT 1 1 
O=-la w }> 
0 oO 1 


and D is the diagonal matrix 


Notice that Q can also be expressed as 


cos@é — sind 0 
Q=]|sin@é cosé 0], ford =7/4, 
0 0 1 


which is a rotation in the first two coordinates (see Example 7.3.5) and the identity transformation in 
the third coordinate. The transformation T, expressed as the orthogonal diagonalization in basis £, is 
the result of an orthogonal (rotation) transformation (Q') followed by a coordinate scaling (D) and 
then by the inverse rotation (Q). 


7.3.3 Completing the Invertible Matrix Theorem 


Recall that one of our main goals in this text stems from the following scenario. We are studying a 
process that can be represented by a linear transformation (e.g., radiographic processes). Suppose we 
have an output of this process and we seek the particular input that gave this output. Then, we are 
essentially solving a matrix equation Mx = b. If M is invertible, we find the input as x = M~'b. 

One of the important consequences of the Spectral Theorem 7.3.12 is another characterization of the 
invertibility of a matrix. Theorem 7.3.16, below, collects all of the characterizations that we have been 
collecting throughout this text; part (0) is the new statement for which the equivalence with part (n) 
can be proven using the Spectral Theorem. 


Theorem 7.3.16 [Invertible Matrix Theorem] 
Let M be ann x n real matrix with columns cj, c2,..., Cy € R” and rows ae fa, anes ioe Then 
the following statements are equivalent. 


(a) M is invertible. 

(b) The reduced echelon form of M has n leading ones. 

(c) The matrix equation Mx = 0 has a unique solution. 

(d) For every b € R”, the matrix equation Mx = b has a unique solution. 

(e) The homogeneous system of equations with coefficient matrix M has a unique solution. 
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(f) The set {c1, c2, ..., Cn} is linearly independent. 
(g) The set {cj, c2,..., Cn} is a basis for R”. 
(h) det(M) 4 0. 
(i) Rank(M) =n. 
Gg) Nullity(M) = 0. 
(k) {71,72,.--, 7} is linearly independent. 
(1) {rj,r2,...,%n} 1s a basis for R”. 
(m) M' is invertible. 
(n) All eigenvalues of M are nonzero. 
(0) M'M has n (not necessarily distinct) positive eigenvalues. 


The proof of Theorem 7.3.16 is a result of Theorems 5.2.20 and 6.2.25 and the result of Exercise 12. 
Now, we return to the heat diffusion example. 


7.3.4 Symmetric Diffusion Transformation 


The heat diffusion matrix, Equation (7.6) is symmetric. It is also orthogonally diagonalizable, as 
asserted by the Spectral Theorem 7.3.12 and verified in Exercise 19. We investigate many of the results 
in this section in the context of a specific example heat state with m = 3. For this numerical example, 
we let 5 = 1/5 so the diffusion matrix is 


3/5 '/5 O 
E= [1/5 3/5 '/s 
0 '/s 3/5 
The eigenvalues are Ag, = | — 5 (1 — cos ak) = 2 + : cos ak We have 


te ~ 0.883 , 
3 
pS. HUGO, 
a5 
324/23 
— 0317 


The eigenvalues are real (see Lemma 7.3.11) and, in this instance, distinct. The corresponding 
normalized eigenvectors are 


; sin | ‘hb 
uy = aa sn> |= [1/2], 
2 3m 1 
sin = pp 
sin an v2 
= 1 40} _ 
uz = —] sin |] = 0 F 
V2 
- 67 
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+ OTE 

sin = Ih 
u3 = —= | sin oe =|-'/2 

a OT 

sin > Ih 


The reader can verify that these vectors form an orthonormal basis for R? and that they are eigenvectors 
of E. The orthogonal diagonalization is E = QDQ' where 


A100 H2Q 9 
D=|04.0}=] 0 2 0 
0 0 A3 0 0 eee 
and 
| | | ih "Wa Np 
Q=|uuw)/=|'/2 0 -'2 
i 'h "v2 "pp 


Next, consider the particular heat state, represented in the standard basis, 


2 


The heat state, w, resulting from a single time step in the heat diffusion process is given by left 
multiplication by E. 


One can also view this transformation as a sequence of more elementary operations. First, the heat state 
is written as a linear combination of the eigenvectors of E. The coordinates in this basis are computed 
as a = [v]y where ax = (ux, v) (see Corollary 7.1.31). 


v)\ (332 
[Vly = Ov= |v} =| Zz 
(u3, v) ae 


This orthogonal change in coordinates should preserve the vector norm. We can compute and compare 
||v|| and ||[v]u||, or equivalently (v, v) and ([v]u, [v]u): 


(v,v) =17 +1742? =6, 


((vlu, flu) = (242)° + ey + (332) =P4v2454+4-v2=%=6. 
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Second, the coordinates of [v]y are scaled through left multiplication by D. 


34/2 . 3+./2 1146/2 

5 2 10 

3-1 =3 

Divju = 3° =| 37 
3-2 | 3-V/2 1-6/2 

5 2 10 


Finally, this vector is written in terms of the original coordinates by left multiplication by ( Q") = Q. 


w = [Dlivluls 
= Q(D[v]u) 
=— +672 544 + ave + MGW iy, 
4]5 
= 6/5 
7/s 


The reader should verify that this final orthogonal transformation preserves the vector norm (that is, 
[wi] = |!D[vlu I). 

As a final observation, consider the fact that ||w|| < ||v||. The action of the heat diffusion 
transformation is decomposed into three successive transformations, two of which preserve vector 
norm, and a third that performs a coordinate-wise vector scaling by the eigenvalues of the transformation. 
As the eigenvalues of E all satisfy 0 < Ax < 1, || Ev|| < ||v|| for any nonzero v. 


7.3.5 Exercises 


1. For each of the following matrices, determine whether or not A is an orthogonal matrix. If not, 
find a matrix whose column space is the same as the column space of A, but is orthogonal. 


1h —!p 010 
(a) A= 
Ih Ih (c) A= 100 
11 1 001 
(b) A= ]2-2 0 73 
= as gl 
10-1 @ A=] 0 Fa 
11 1 
2 fz 2 


2. Suppose Q and R aren x n orthogonal matrices. Is QR an orthogonal matrix? Justify. 
3. Show that the matrix M of Example 7.3.14 is not invertible. 


sin@ cosé 
a EE AG tee — sin P| 


(a) For what values of @ is Ag orthogonal? 


(b) Draw the vector u = and its images under the transformation T(u) = Agu for 0 = 
45°, 90°, 30°, 240°. 


(c) Use your results to state what multiplication by Ag is doing geometrically. 
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5. Draw a figure that shows that the transformation defined in Example 7.3.6 does not represent a 
rotation, but is in fact the reflection stated in the example. (Hint: To where do e; and e2 map?) 
6. Provide an orthogonal diagonalization of the following symmetric matrices. 


(2 1 J3 1 0 
@ m= (7 /). () C=[ 1 -V30 
02 0 01 
) HG 5) 012 
(dd) R=|1-21 
210 


7. Is the set of all 3 x 3 orthogonal matrices a subspace of M3,3(R)? Justify. 
8. Use geometric arguments to show that the counterclockwise rotation transformation in Example 7.3.8 
is diagonalizable only if 0 = nz,n € Z. 
9. Show that the reflection transformation (about a line) in IR? has two distinct eigenvalues A; and 
Az for which the dimension of each of the corresponding eigenspaces has dimension 1. 
10. Use the family of rotation matrices of Example 7.3.5 and Corollary 4.2.27 to derive the angle 
sum formulae and triple angle formulae for sine and cosine: 


cos(@ + @) = cos@ cos ¢@ — sin@ sing, 
sin(@ + @) = cos@ sing + cos ¢ sind, 
cos 36 = 4cos* 9 — 3cos6, 
sin 30 = 3sin@ — 4sin° 0. 


11. Show that the transformation in Example 7.3.15 preserves angles between vectors but does not 
preserve vector norm. Explain this result. 

12. Show that for every real matrix A, both AA! and A'A are orthogonally diagonalizable. What are 
the necessary and sufficient conditions on A for AA! to be invertible? For A'A to be invertible? 

13. Let Q be ann x n orthogonal matrix. Prove that QQ' = I,. (Notice the difference between this 
claim and Lemma 7.3.2.) 

14. Is the converse of Lemma 7.3.2 true? That is, if Q'Q = I, is Q necessarily an orthogonal matrix? 
This question is asking whether or not the condition Q'Q = / is a necessary and sufficient 
condition for a Q to be orthogonal. Prove or give a counter example. 

15. Show that left multiplication by the matrix M, given in Example 7.3.7 performs the indicated 
reflection transformation. 

16. Complete the proof of Theorem 7.3.3. 

17. Prove Corollary 7.3.13. Notice that because every symmetric matrix is orthogonally diagonalizable, 
O<k<n. 

18. Show that the nullspace of the heat diffusion transformation is the trivial subspace. 

19. Verify that the eigenvectors of the heat diffusion transformation, given in Equation 7.7, form an 
orthonormal set. In particular, show that 


[0 iff ¢k 
(uj) = |) if j =k . 
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20. Verify that the eigenvalues of the heat diffusion transformation are those given in Equation 7.8. 
21. Prove the converse of Lemma 7.3.10. 


The following exercises explore properties of transformations on complex-valued vector spaces. 


22. Consider transformation T : C” — C” defined by left multiplication by matrix M € Myxn(C). 
The adjoint of M, written M*, is M* = M'. A matrix equal to its adjoint, M = M%, is said to 
be self-adjoint or Hermitian. Show that every real symmetric matrix is self-adjoint. 

23. Prove that every eigenvalue of a self-adjoint transformation is real. 

24. Consider transformation T :C” — C” and define T; = 5(T +T*) and Th = x (T —T*). 
Notice that T = 7, + i7>. Prove that 7; and 7» are self-adjoint. 

25. Amatrix M € M,,xn(C) is said to be normal if MM* = M* M. Show that every real symmetric 
matrix is normal. Show, by specific example, that not every normal matrix is symmetric. 

26. A matrix 0 € My xn(C) is said to be unitary if Qu! = Q*. Prove that if Q is a unitary matrix, 
then || Qx|| = ||x|| for all x € C”. 

27. A matrix M € Myxn(C) is said to be unitarily diagonalizable if M = QDQ*, where Q is 
unitary and D € M,x(C) is diagonal. Prove that matrix M is normal if, and only if, M is 
unitarily diagonalizable. 


7.4 Exploration: Pseudo-Inverting the 
Non-invertible 


Our study of the heat diffusion process leads us to the discovery of a variety of important linear 
algebra topics. We found that for some linear transformations T : V — V, there are entire subspaces 
of eigenvectors that are transformed only by multiplication by a scalar (eigenvalue). Whenever we 
are able to find a basis (for V) of such vectors, transformation computations become almost trivial 
because we represent vectors as coordinate vectors relative to that basis. The key step is to recognize 
that such transformations are diagonalizable—telative to an eigenbasis, the matrix representation is 
diagonal. This led to our ability to perform repeated application of the transformation quickly and 
allows us to study long-term behavior. We also noticed that the heat diffusion process is represented as 
a symmetric matrix and our investigation in Section 7.3 led us to see that every real symmetric matrix 
is diagonalizable. 

Meanwhile, we are still investigating how we might “invert” the radiographic transformation T, a 
transformation that is typically not one-to-one. The matrix representation M—trelative to any basis—is 
typically non-square and therefore not invertible. However, in our study of transformation spaces in 
Section 5.1, we came very close to discovering an isomorphism between the column space and row 
space of a matrix. In this section, we will revisit this idea as the Maximal Isomorphism Theorem, then 
learn how this theorem provides a way to recover any non-invisible object from the corresponding 
radiograph. 


7.4.1 Maximal lsomorphism Theorem 


The transformation that allows us to solve T(x) = b when T is not invertible is known as the pseudo- 
inverse and is defined as follows. 
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Definition 7.4.1 


Let V and W be vector spaces and T : V — W linear. The transformation P : W — V is called 
the pseudo-inverse of T if and only if P(T(x)) = x for all x € Null(T ys 


The pseudo-inverse differs from the left inverse (see Definition 4.7.53) in that if S is a left-inverse 
of T, S(T (x)) = x for all x € V instead of only for x € Null(T)+. 

We are now ready to state a key invertibility theorem. We see that a linear transformation T is 
invertible from the range to a restricted subspace of the domain. 


Theorem 7.4.2 
Let V and W be vector spaces of dimension n and m, respectively, and let T: V ~ W bea 
linear transformation. Then T : Null(7)+ — Ran(T) is an isomorphism. 


Proof. Let V and W be vector spaces and suppose T : V > W is a linear transformation. We 
know that the only vector in Null(7)+ that maps to Ow is Oy. Therefore, by Theorem 5.1.21, 
T : Null(T)+ — Ran(T) is one-to-one. By Theorem 5.1.20, T : Null(7)+ — Ran(T) is an onto 
transformation. Therefore, T : Null(7)+ — Ran(T) is an isomorphism. 


Theorem 7.4.2 tells us that the set of nonzero possible radiographs is in one-to-one correspondence 
with the set of non-invisible objects that have no nullspace components. And furthermore, the 
isomorphism between them is the radiographic transformation itself. To help us find the inverse 
transformation, we again consider the corresponding matrix representation and spaces. 


Theorem 7.4.3 [Maximal lsomorphism Theorem] 

Let U : R” —> R” be the linear transformation defined by U(x) = Mx where M is anm xn 
matrix. Then, U: Row(M) — Col(M), defined by U(x) = U(x) = Mx for all x € Row(M), 
is an isomorphism. 


Proof. Let U and M be as above. Also, suppose i, oe wigs a are the rows of M. For v € Null(U), 
riv (r1, v) 
ryv (r2, v) 
0 = Mv= B = 
rhv (Tm, V) 


Therefore, if r, € Null(U) for some 1 < k < m, then (rz, r,) = 0. That is, rg, = 0. So, we know that 
re € Null(U)+ forall 1 < k < m. Thatis, Row(M) © Null(U)+. Now, we know that R” = Null(U) @ 
Null(U)+. Therefore, n = dim(R”) = Nullity(U) + dim(Null(U )+). We also know, by the Rank- 
Nullity Theorem (Theorem 5.1.30), that n = Nullity(U) + Rank(U). Therefore, dim(Row(M)) = 
Rank(U) = dim(Null(U)+). Therefore, Row(M) = Null(U)+. 

Now, by Theorem 7.4.2, U : Row(M) — Col(M) is an isomorphism. 
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Theorem 7.4.3 tells us that we may restrict the transformation U to a particular subspace of its 
domain (the row space of the associated matrix) and the resulting map will be an isomorphism. The 
theorem is called the Maximal Isomorphism Theorem because one cannot restrict U to any larger 
subspace and still have an isomorphism. 


Corollary 7.4.4 

Let M be an m x n matrix and U : R” — R” be the transformation defined by U(x) = Mx. 
Suppose V C R” and W C R” are subspaces of dimension k, and U : V > W givenby U(x) = 
U(x) = Mx is an isomorphism. Then k < Rank(/). 


We leave the proof as Exercise 15 in the Additional Exercises section. 

The Maximal Isomorphism Theorem is a powerful step toward tomographic reconstruction of 
radiographs formed by the radiographic transformation 7. In the matrix representation M = Wale 
it is possible to reconstruct any object represented as a linear combination of the rows of M. These 
objects are in one-to-one correspondence with the radiographs represented as linear combinations of 
the columns of M. This notion of invertibility does not rely on any specific conditions on the rank or 
nullity of M, nor does it require the transformation to be one-to-one, nor does it require T to be onto. 
The inverse of the maximal isomorphism is the pseudo-inverse given in Definition 7.4.1. However, we 
do not yet know exactly how to find such a maximal isomorphism. Until we can, we are also not able 
to find the pseudo-inverse. The rest of this section is an exploration that will guide the reader toward 
a better understanding of the pseudo-inverse. 


7.4.2 Exploring the Nature of the Data Compression Transformation 
In this section, we will explore the conditions under which transformations that are not one-to-one can 


be partially invertible. For this exercise, suppose we have a vector space Z of 2 x 3 pixel images with 
the geometry: 


Ty x3 U5 


x2 4 vs 


Coordinate vectors for vectors in Z are in R°. We would like to see if we can store the image data as 
a vector in R° as a simple data compression scheme. Suppose we store only the column and row sums 
as a vector in R>. That is, we let f : Z > R° be the transformation that takes an image and outputs 
the compressed data vector b defined as 


by v1 1 £2 
by 37 XL4 ry v3 U5 

b=] bs} = I5 + X6 =f = f(v). 
ba t+ X35 v2 v4 v6 


bs v9 v4 XG 
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. Using the standard ordered bases for Z and for R>, find [u], [ f(u)] and M = [f], where 


(To simplify notation, we use [v] to mean the coordinate vector and [f] to mean the matrix 
representation of f according to the standard bases.) 


. By inspection (and trial and error), find two linearly independent null vectors of f. 
. Determine Col(M), Null(M), Row(M), Rank(M), and Nullity(/). 


(It is advisable to use MATLAB/OCTAVE to find these. It might be useful to know that, in 
MATLAB/OcTAVE, M' is found using transpose(M). The command M’ computes the complex- 
conjugate transpose of M, so for real matrices, these two methods are equivalent.) 


. Is f one-to-one? Onto? Invertible? 
. For a general linear transformation, A, relate dim((Null(A))+) to Rank(A). Justify your 


conclusion. 


A Pseudo-Inverse 
Now, consider the vectors 


Y= 


and the matrix transformation P : R> — R° defined by (note the prefactor !/30): 


12 —3-3 8 —2 


ee eee 
eee ee) 
PU) = 351-3 12 -3-2 8 
i 2 DB 

ee eee 


6. Use MATLAB/OCTAVE to compute P(Mv,;) for k = 1, 2, 3. (You should find that P(Mvu,) = vz 


for some of the test vectors and P(Mv,;) € vz for others. You can also complete this task by 
direct hand computation.) 


. Draw some conclusions about the possible nature of the matrix transformation P. Use the results 


of Questions 5 and 6 and recall Corollary 4.7.50 and Theorem 4.7.37. 


. Based on your work thus far, how would you assess the effectiveness of recovering images 


compressed using M and recovered using P? How is this idea similar to, or distinct from, the 
radiography/tomography problem? 


Conjecturing about a General Pseudo-Inverse 


9. 


You will recall that if S is an n x n symmetric matrix of rank n, then S is orthogonally 
diagonalizable. So, we can write 


S=QDQ'and S-“'=QpD"'0'" 


7.4 Exploration: Pseudo-Inverting the Non-invertible 451 


where Q is ann x n orthogonal matrix satisfying Q' = Q~! and D is a diagonal matrix whose 
entries are the (nonzero) eigenvalues of S. Now, we know, for S with rank r < n, that D has r 
nonzero entries. Verify that a pseudo-inverse of S is 


P= 9D"'@', 


where D is the r x r diagonal matrix whose entries are the nonzero eigenvalues of S, and O is 
the n x r matrix composed of the columns of Q corresponding to the same eigenvalues. 

10. Use the interpretation of the orthogonal diagonalization on Page 439 to interpret the fact that if 
Sx = band S = QDQ', then D(Q'x) = (Q"D). 

11. We wish to employ similar methods for creating a pseudo-inverse of a non-square matrix M. What 
type of decomposition of M would lend itself to such a construction? (Be careful to consider the 
properties of Q and D for a square matrix. Which of these properties do we need when M is 
m x n instead?) Applying your conjecture, construct the pseudo-inverse of M. 


7.4.3 Additional Exercises 


12. Let S be ann x n symmetric matrix of rank r < n. Show that S$ can be exactly reconstructed 
from r eigenvectors and the r nonzero eigenvalues. 

13. Let S be ann x n symmetric matrix with nonnegative eigenvalues. Find the “square root” matrix 
A where A? = S. 

14. Let V and W be vector spaces. Suppose T : V — W isa one-to-one linear transformation. Show 
that the left inverse of T is a pseudo-inverse. 

15. Prove Corollary 7.4.4. 


7.5 Singular Value Decomposition 


The tomography problem—trecovering an object description from radiographic data—has, thus far, 
remained stubbornly intractable. We have found that if the radiographic transformation is one-to-one, 
then we can utilize the left inverse transformation. However, this would require an unreasonably large 
number of radiographs. Now, we understand that at least some transformations T that are not one-to- 
one do have a pseudo-inverse P. If this is the case for the tomography problem, then we can reconstruct 
brain images from, possibly few, radiographs. 

The big concept of this section is that, in fact, every matrix has a (generalized) diagonalization and 
a pseudo-inverse. This diagonal form is called the singular value decomposition, or SVD. 

The singular value decomposition of a matrix is important and useful in a variety of contexts. 
Consider a few representative applications. 


Inverse Problems. The singular value decomposition can be used to construct pseudo-inverse 
transformations. Our key example is in brain scan tomography. Figure 7.19 shows example brain image 
reconstructions obtained from non-invertible radiographic transformations using varying number of 
views. More views result in better reconstruction detail. This example uses very-low noise radiographic 
data. 


Data Compression. The singular value decomposition can be used to approximate data sets stored 
as arrays, such as images. Approximate representations are stored with relatively small memory 
footprint as a small collection of vectors. The image can be approximately recovered from these 
vectors. Figure 7.20 shows an example of the quality of recovery. 
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Fig.7.19 Brain scan pseudo-inverse reconstructions of slice 50 using varying number of radiographic views (indicated 
in each subfigure). 


Fig.7.20 Example of data compression using singular value decomposition. At left is the original RGB image of Indian 
Paintbrush. At right is a reconstruction obtained from vectors comprising 30% of the memory footprint as the original 
image. 


Principal Components. The singular value decomposition can be used to determine the principal 
features in a data set in which each instance is a vector in R”. Sunspots observed on the surface of the 
sun wax and wane over a cycle of roughly 9-13 years. Figure 7.21 shows five representative sunspot 
abundance curves as colored lines® . Times are normalized by percent of time through a given cycle, and 
sunspot abundance is reported according to a standard formula as sunspot number (the Wolf number). 
The five curves illustrate the diversity of sunspot evolution from cycle to cycle. When the sunspot data 
are represented as a data matrix, we will see that singular value decomposition provides a principal 
(unit) vector v and a vector of amplitudes a. We will also see that the principal vector is the sunspot 
number evolution curve that explains most of the trend. The individual curves are approximated by 
uv scaled by the corresponding amplitude from a. The principal vector is the gray curve in the figure 
(scaled for visual clarity). Because of the scaling, it is important to recognize the trend (shape and 


© Monthly sunspot data was obtained from the SILSO, Royal Observatory of Belgium, Brussels (http://www.sidc.be/ 
silso/datafiles). 


7.5 Singular Value Decomposition 453 


300 


250 


I) 
fo) 
[o) 


Sunspot Number 
a 
jo) 


Percent of Time Through Sunspot Cycle 


Fig. 7.21 Five representative sunspot number cycle curves (colors) and the principal vector (gray) that explains most 
of the trend. The principal vector is arbitrarily scaled for visual clarity. 


location of various features) in the gray curve and to not focus on the amplitude. Particular features 
that may be of interest to researchers are where the maximum occurs, how long it takes to reach half 
the maximum, how fast the curve rises, what is the typical variation around the maximum, etc. This 
principal vector can also be used to predict the evolution of a partially complete sunspot cycle. 


7.5.1 The Singular Value Decomposition 


We have seen that every symmetric matrix has an eigenvalue diagonalization. In particular, every 
symmetric matrix can be represented as an orthogonal change to eigenbasis coordinates, followed by 
a diagonal transformation stretching each eigenvector by its eigenvalue, followed by a change back to 
the original coordinates through an orthogonal transformation. We will show that, in an analogous way, 
every matrix has a (generalized) diagonal representation. That is, there exist orthogonal coordinates 
for the row space and column space so that the transformation is diagonal. First, we need to generalize 
the definition of diagonal matrix. 


Definition 7.5.1 


Let M be an m x n matrix with (i, j)-entry m;;. We say that M is a diagonal matrix if and only 
if mj; = 0 whenever i # j. 


1000 
An example of a3 x 4 diagonal matrix is D, defined by D= | 0400 

0010 
Now, we present the main result of this section. 
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Theorem 7.5.2 [Singular Value Decomposition] 

Let M be an m Xx n real matrix of rank r. There exist m x m orthogonal matrix U, n x n 
orthogonal matrix V, and m x n diagonal matrix © such that M = UXV", where the diagonal 
entries of X are oj > 02 >... > Omin(inym) = O and o, = 0, fork > r. 


The proof is by construction. We show how to construct matrices U, &, and V, which satisfy the 
conditions of the theorem. Therefore, the proof is important also in understanding how to compute the 
SVD. 


Proof. Let M be an m x n real matrix of rank r. We will construct the m x m orthogonal matrix U, 
n x n orthogonal matrix V, and m x n diagonal matrix © such that M = UXV". By construction, 
the diagonal of & will have r positive nonincreasing entries followed by min(m, n) — r zero entries. 
(This means that if M has more rows than columns, the number of zero entries along the diagonal is 
n —r, but if M has more columns than rows, there will be m — r zeros along the diagonal.) 

We know that the n x n symmetric matrix M'M and the m x m symmetric matrix MM" both have 
rank r. Therefore, by Theorem 7.3.12, both are orthogonally diagonalizable. So, we can write 


M'M=VDvV', 
where V is orthogonal with columns v1, --- , v, and D is diagonal with entries d;,--- , d,. Suppose, 
for notational convenience, that the nonzero eigenvalues are d,,--- ,d-, and d,4; =---=d, =0. 
Similarly, we write 

MM'=UCU', 
where U is orthogonal with columns u1,--- , % and C is diagonal with entries c),--- , Cm. 


Now, consider an arbitrary eigenvector v, for which the corresponding eigenvalue d;, is nonzero. 
Because v, is an eigenvector of M'M with eigenvalue dx, we see that 


(MM")(Mvx) = M(M' Mx) = M(dyvg) = dy (My). 


Therefore, Mv, and ux are eigenvectors of M M! with eigenvalue d;. Thus, as we have freedom to 
choose orthogonal eigenvectors as a basis for any eigenspace, each eigenvector Mv, of eigenvalue dx 
can be expressed as a scalar multiple of such a chosen eigenvector u; of eigenvalue c; = dx. We can 
list the vectors u1,--- , Um, in a different order, so that Mv,z = oxux for suitable nonzero constants 
ox and for all 1 < k < r. Now, define 0,41 = 0,42 = ... = Omin{m,n} = 0 and define the & to be the 
m x n diagonal matrix with diagonal entries 01, --- , Omin{m,n}- 

We now claim that the matrix decomposition with these properties is realized by M = UXV". 
Since {v1}, v2,..., U,} is a basis for R”, we can show this is true, by showing that Mu, = UX Vi uy 
for 1 < k <n. Indeed, if we let e, be the vector whose kth entry is | and all others are 0, we have 


—f\ | 
1 
<r 
eV", HUT = Vk 
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(U1, Uk) 
(v2, Uk) 


=U 
(Un, Uk) 
= UXex 
| | | 


= | uy U2... Um | (oxex) 


Thus, M = UV". 
Now, we show that og > 0 for 1 < k < min(n, m). This is done by showing that D = =!> and 
C = >! in which case, we have that Ge are the eigenvalues of M'M and MM". We know that 


VDV'=M'M 
= (UXV')'(USV') 
=Vv>yluluxv! 
=VElxy!, 


Since V and V! are invertible, D = D'D. Similarly, C= ©X!. Now, U!D is ann x n diagonal 
matrix with Oo; ae, Ste oe along the diagonal. Thus, the real eigenvalues d; satisfy dy = ae; for 
1 <k <n. Similarly, the real eigenvalues, cz satisfy cy, = oe for 1 < k <m. We choose o; > O for 
1 <k < min(n, m) and appropriately adjust the signs on the corresponding column vectors in U (or 
V), if necessary. 

The decompositions of M'M and MM! are not unique because the ordering of the columns of V 
and U gives different decompositions, we can choose to order the columns so that the corresponding 
ox are in order so that 0; > o; > .... Finally, by the rank-nullity theorem, whenever k > r,o, = 0. 


The fact that every real matrix admits a singular value decomposition is very enlightening. Every 
linear transformation, T : V — W, can be expressed as a matrix relative to bases for the domain and 
codomain. This matrix representation has a singular value decomposition M = U XV". Consider two 
equivalent interpretations. 


1. Left multiplication by M is equivalent to a sequence of three matrix products. First, there is an 
orthogonal transformation (a generalized rotation preserving distances and angles). Second, there 
is a scaling along each new coordinate direction, expanding if og, > | and contracting if ox, < 1. 
Third, there is another orthogonal transformation to some codomain coordinates. Every linear 
transformation can be viewed in terms of this sequence of three operations. 

2. Consider the equation Mx = b with equivalent representation UX V'x = b. Because the transpose 
of an orthogonal matrix is also its inverse, we have & (V'x) = (U "D). With this view, we see that 
left multiplication by U' and V' represents codomain and domain coordinate transformations, 
respectively, for which the overall transformation is diagonal. That is, if we use the columns of 
U as a basis B, for the codomain and the columns of V as a basis Bg for the domain, we have 
[M15 [x]g, = [b]g,, where [My = ¥ is diagonal. 
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The proof of Theorem 7.5.2 demonstrates one method for finding the singular value decomposition 
of an arbitrary real matrix M. According to the proof, we can decompose any matrix M by finding the 
eigenvectors and eigenvalues of M'M and MM". The right singular vectors (the columns of V) are 
eigenvectors of symmetric matrix M'M. The singular values are square roots of the eigenvalues of 
M' M. The left singular vectors (columns of U) can then be determined by Mv; = o;,uz. Alternately, 
we can find the left singular vectors as the eigenvectors of MM'. We form the matrix © using the 
singular values. The conventional ordering for columns of U and V and corresponding diagonal entries 
of & are in order of non-increasing singular values, o,. Throughout this section, we will keep to this 
convention. The following examples illustrate these ideas on small matrices. 


-12 
M= 
(333) 
which has no real eigenvalues. We can construct a singular value decomposition for M. The first step 
is to orthogonally diagonalize M'M as V DV". We have 


Example 7.5.3 Consider the matrix 


MW iw= VV". shee XS eo ore) an x Gee Hey 


+0.7534 —0.6576 +0.0000 +0.3982 


We have ordered the eigenvectors so that the eigenvalues are in non-increasing order. The singular 
values of M are o) = Jd; © +4.7541 and 02 = /d2 © +0.6310. The left singular vectors are 


Mv, _ (+0.4553 
o,  \+0.8904) ’ 


uy = 


i = M2 ~, (0.8904 
oo ~ \ 40.4553)’ 


Alternately, we can compute uw, and u2 by diagonalizing the matrix MM! as 


ma’ =ypu'- Ga oo) ern ae) eee a . 


+0.8904 +0.4553 ] \ +0.0000 +0.3982 ] \ —0.8904 +0.4553 


We find uw; and uz as the columns of U. Thus, we have the singular value decomposition of M 


M=USV'x +0.4553 —0.8904\ /+4.7541 +0.0000\ /—0.6576 +0.7534 
a ~ \ 40.8904 +0.4553/ \+0.0000 +0.6310) \ —0.7534 —0.6576 } ° 


In the above example, we can find the eigenvectors and eigenvalues of a matrix using mathematical 
software. In fact, the above were found using the eig command in OCTAVE. 

We know that every real symmetric matrix is orthogonally diagonalizable. Such a diagonalization is 
not necessarily equivalent to the singular value decomposition because the matrix may have negative 
eigenvalues. However, the singular values are all nonnegative. This is possible because the left and 
right singular vectors need not be the same (U # V in general). 
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Example 7.5.4 Consider the real symmetric matrix 


12 
#= (21) 
which has orthogonal diagonalization 
_ tT (-Yp vh\ (-10\ (-p vp 
=O = ( Vip, V2/2 0 3 v2p, v2) 


We can readily create the SVD in two steps. First, we change the sign of the first column of the left Q 
matrix and the sign of the corresponding eigenvalue in D: 


Me= Vif, Vip\ (1 0\ (—v2/)2 v2 
~ \-vih vip} \03 VIP, V2) 
Then, we reorder the entries of the diagonal matrix (to non-increasing order) and the corresponding 
columns of the orthogonal matrices. 


= _— (Xp vp 30 vif, V2), 
M=UZV' = ce =) ( ') oes ) , 


The previous example suggests a general method for finding the SVD of any symmetric matrix. 
Suppose n x n symmetric matrix M has orthogonal diagonalization Q DQ! where the eigenvalues A, 
(and corresponding eigenvectors, g;) are ordered so that |A,| > |A2| >--- => |A,| => 0. Then the SVD 
is given by M = UV", where og = |Ax|, V = Q, and the columns of U are sgn(Ax)qx, where 


—-1l ifx <0 
sen(x) = 40 ifx=0O. 
1 ifx>0 


The next example demonstrates the construction of the singular value decomposition of an m x n 
matrix with m <n. 


Example 7.5.5 Consider the nonsquare matrix 


121 
a ¢ 1 i) : 
In order to compute the SVD, we begin by finding the orthogonal diagonalization M'M = VDV' 
where D is diagonal with entries d| + 7.606, dz ~ 0.394, d3 = 0; and 


—0.3197 —0.7513 +0.5774 
Ve 0.8105 —0.0988 —0.5774 
—0.4908 +0.6525 +0.5774 


We have the singular values of M: 0, = Jd, © 2.758 and 02 = /d2 © 0.6280. We also have the right 
singular vectors vy = gx. The left singular vectors are computed as ug = Muvx/ox for all k such that 
ox > 0. We have 
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Mu | ca 


uj = 


O} +0.4719 
a eM ~ (-04TI9 
= +0.8817) ’ 


The singular value decomposition is 


—0.7513 —0.0988 +0.6525 


) 0.3197 —0.8105 —0.4908 
+0.5774 —0.5774 +0.5774 


YoRee ~ bee aca oa 0 0 


+0.4719 +0.8817 0 0.628 0 


The matrix spaces associated with a transformation are simply defined in terms of the columns of 
the orthogonal matrices U and V. 


Corollary 7.5.6 
Let M be anm x n real matrix of rank r with singular value decomposition M = UV". Then 


(a) {v;41,-°-- , Un} is an orthonormal basis for Null(//). 
(b) {v1,--- , v-} is an orthonormal basis for Null(M)+ = Row(M) 
(c) {u1,--- ,u,} is an orthonormal basis for Ran(M) = Col(M). 


Proof. Suppose M is an m x n real matrix of rank r with singular value decomposition M = UXV'. 
Let 01, 02,..., Omin{m,n} be the diagonal entries of &. Notice that Mv = 0 for any vector in S = 
{Up +1, Ur+2,°** , Un} because 0, = Ofork > r. Therefore, S C Null(M). By the rank-nullity theorem, 
the nullspace of M has dimension n — r, and S is a linearly independent set of n — r null vectors of 
M (see Equation 7.9) and so is also a basis for Null(/). The dimension of Null(M)+ = r. Now, 


{v1,--- , v,} is a linearly independent set of r vectors in R” that are not null vectors and so the set is a 
basis for Null(M)+ = Row(M). Finally, by Equation 7.9, Ran(M) = Span{u1, v2,--- , u-}. And as 
{u,,u2,--- ,u,} isa set of r linearly independent vectors in R”, it is also basis for Ran(M). 


Example 7.5.7 Consider the matrix representation, T, of the radiographic transformation of the 
following scenario, and interpret the singular value decomposition T = UXV". 

Suppose we have an object space of N = 4 voxel values and M = 6 radiograph pixels in a three- 
view arrangement at angles —45°, 0°, and +45°. Suppose each view has two pixels and the pixels are 
scaled (ScaleFac= <2) to include a full view of the object space. The reader should take the time to 
verify the following results. First, the radiographic transformation is given by 


1 'h'h 0 
O'A'h 1 
1100 
0011 
'‘h1 O'p 
'hO 1 'p 
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The symmetric matrix T'T has orthogonal diagonalization VSV' where the columns of V are 


+f =p =p =p 
_ | tt» _ |p _ | tt» _ | tt» 
U1 naa! +!h ’ v2 = +!h ’ U3 — lp ’ U4 = +!h ’ 
+h +p +h —lph 


and the corresponding eigenvalues (diagonal elements of D) are d, = 6, d) = 3, d3 = 1 and d4 = 0. 
The singular values of T (diagonal elements of ©) are o, = dx: o) = J/6, 02 = V3, 03 = | and 
o4 = 0. The transformation has rank r = 3 (the number of nonzero singular values) and the first three 
columns of U are given by ux = Tvx/ox. 


+'/v6 —"/yva —!p 
+'/V6 +1/v12 +/, 
ne +'/V6 a= 3 pe 0 
+6 Y +va fo 0 
+'/V6 —yva +'/2 
+'/¥6 +1/vi3 —'p 


The remaining three columns of U consist of a set of orthogonal unit vectors which are also orthogonal 
to Span{u1, “2, 43}. The Gram-Schmidt process is one method for finding such vectors {u4, us, U6}. 
Consider the following observations. 


1. The nullspace of T is Null(T) = Span{v,;+1,--- , v,} = Span{v4}. Notice that vq is the coordinate 
vector for the object 


~1h 1p 
w= ’ 


Vp | -1/y 


for which T v4 = 0. This result follows from the fact that v4 is a singular vector with singular value 
of zero. It is also apparent by computing the transformation geometrically. The nullspace of T is 
the set of all invisible objects. 

. The range of T (the column space of T) is Ran(7) = Span{u1,--- , u-} = Span{wj, v2, u3}. The 
range of T is the set of all possible radiographs—radiographs which correspond to some object 
through the transformation. 

3. The row space of T is Row(T) = Span{vj,--- , v;} = Span{vj, v2, v3} is the set of all objects that 

are orthogonal to the set of invisible objects. These are also the objects that are fully recoverable 
from radiographs. 


N 


The singular value decomposition is a powerful tool for understanding transformations, and the next 
few sections will make extensive use of its properties. We will need to solidify some notation to help us 
understand key ideas and formulate related transformations—such as pseudo-inverse transformations. 


> Notation Let M be anm x n real matrix of rank r with singular value decomposition M = UV". 


1. We denote the columns of U by ux for k € {1,2,--- ,m} and the columns of V by vu, for 
ke {1,2,--- ,n}. 
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. We denote the ordered singular values of M, the diagonal elements of X, as oj > 02 > --- 
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. We define U. 5 as the m x s matrix whose columns are the first s columns of U and U. 5 as the 


n X (n — s) matrix whose columns are the last n — s columns of U: 


Us = | uy uo +++ Us |. Us = | sg Usg2 0+ Un 


. We define Vv, as the n x s matrix whose columns are the first s columns of V and V, as the 


n X (n — S) matrix whose columns are the last n — s columns of V: 


Vy = VI U2 +++ Us Vs = Us+1 Us+2 °** Un 


o, > 0, and o, 4) = +--+ = Omin{m,n} = 0. 


. We define the s x s diagonal matrix &, for s < r as 


~ oj ifi=j 
(2s i= 0 otherwise’ 


We can use this new notation to simply express the result of mapping an arbitrary vector x € R” 
to b € R” using the matrices U, V, and &. In the following construction, we assume r < m <n. The 
other case, r < n < m, is similarly demonstrated with the same result. (See Exercise 8.) 


Mx =UXV'x 
_ 
oO! 0...0 — 
| | | 02 
= [Uj u2--+ Un x 
| | | ; 
Om 0... 0 
_ = 
O| 0...0 (v1, Xx) 
02 (v2, X) 
=] uy, ur: Um 2 
Om 0... 0 (Un, X) 
1(v1, x) 
2(v2, x) 
= | Uy ug Um 
Om (Un, X) 


= 01 (V1, X)Uuy + 0202, x)u2 + +++ + Om(Um, X)Um 
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We summarize this general result for a matrix transformation of rank r: 


. 
Mx =UXV'x= So ox (ve, x) UK = U5, V,'x, (7.9) 
k=1 


Example 7.5.8 Returning to Example 7.5.7. We can write the transformation T = U3 ¥3 Ve using the 
following matrices 


+1/¥6 —1/v2 —'/2 


+1/¥6 +1/vi2 +1/2 ve 00 Tp pap 
~ | tive —'/v3 0 ee eee ea oe ae 
U3= +e +a 0 ag a and V3 = Hipdieedy 
+/v6 —'/via +1/2 4-12 fa le 


+ v6 +1/via — 1/2 


The singular value decomposition has properties that lead to important results in constructing 
pseudo-inverse transformations and low-rank approximations of matrices. Before proceeding, we 
define the trace of a matrix in order to ease notation and then define a way of measuring the “size” of 
a matrix. 


Definition 7.5.9 


Let A be ann x n matrix with entries a;;. The trace of A, denoted tr(A), is the sum of the diagonal 
n 


entries of A: tr(A) = Se Akk- 
k=1 


The trace of a matrix is indeed as simple as it sounds. For example, if we have the matrix 


3 41 
M={-221], 
4 25 


then r(M@) = 34+2+4+5= 10. 
Our measurement of the size of a matrix is the Frobenius norm. 


Definition 7.5.10 


Let A be an m x n matrix. We define the Frobenius norm of A as 
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I|Ally = vtr(ATA). 


The Frobenius norm is indeed a norm, derived from the Frobenius Inner Product, found in Table 7.1, 
and satisfies the properties outlined in Theorem 7.1.18. We leave it to the reader to show this result. 
Now, if A is the 2 x 3 matrix given by 


then 
12 542 
A'TA=]21 (ae 451 
01 211 

So, 


Alle = Vtr(ATA) = V1. 


Another way to compute ||Al| - is 
m n 


2 
Alle = | >> >oa?,. 


i=1 j=1 
Now, given an m x n matrix, M, we present a method, using the singular value decomposition, for 
creating a rank s approximation of M. The approximation will be close to M as measured by the 
Frobenius norm. 


Theorem 7.5.11 
Let M be an m x n real matrix of rank r with singular value decomposition M = UXV". Let 


Ss 
M, = So oxunry for all 1 < s <r. Then 
=i 


(a) M=M,, 


, 
(b) || M—Msllr < 0 ox, and 


s+l1 
(c) M, has rank s. 


Proof. (a) Suppose M is areal m x n matrix of rank r with singular value decomposition M = UV". 
Let vj, v2,..., VU, be the columns of V, uj, u2,...,Um be the columns of U, and oj, 02,..., 0+ 
be the nonzero diagonal entries of &. Then {v1, v2,--- , v,} is an orthonormal basis for R”. We 
know that the two matrices M and M, are equal, if their corresponding linear transformations are 
the same. Using Corollary 4.2.27, we verify that the transformations are equivalent by showing that 
Mv; =UX Vio; = (> ka1 OKUk 7) v; for all 7. We have been using block matrices (See pages 182 
and 416.) 
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= U,5,Vi 0; 
oH 
hi vl | 
= [Uy U2... Ur 0 . 0 x Uj 
| | | o | 
r = uf 
T 
| | | vv; 
= | O[U O2U2... O- Uy : 
T 
| | | U, Uj 
r 
T 
= So oun (vgv;) 
=! 


, 
= (» cunt] Uj- 
k=1 


AY 
(b) Next, let M, = So oxunry for 1 < s < r. Then we have 
k=1 


r AY 
T T 
IM — Msllr = |] 0 oxunvy — Do onucvg 
k=l k=l 


=| > owe 


F 


k=s+1 F 
r 
= Yo |uxet| 
F 
k=s+1 
r 
— 2 Oxy tr (UEU;UKY,) 
k=s+1 
r 
— > Ok tr (vgv;) 
k=s+1 
r 
ae 
k=s+1 


The inequality in the proof follows from repeated application of the triangle inequality. 
(c) The range of the transformation T(v) = Msv is the span of the s linearly independent vectors 
{u,,u2,--- , Us}. Thus, Rank(M,) = dim(Ran(M;)) = s. 


Theorem 7.5.11 shows us that a matrix transformation (or any matrix) can be written as a linear 
combination of outer products (uxv,) of unit vectors. If M is m x n it is expressed as an array of 
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nm teal values. However, if rank(M) = r, we can equivalently express the matrix with (m+n -+ l1)r 
values (r singular values, r vectors uz € R™”, andr vectors vg € R”). If (m +n-+ 1)r < mn, then the 


SVD provides a method of data compression. 


Example 7.5.12 Consider the matrix 


2-100 
A= [4-221 
5-330 


The singular value decomposition of A is given by the orthogonal matrices (expressed to three decimal 
places): 


0.774 —0.294 —0.562 0.000 
+0.445 —0.322 —0.445 —0.707 
—0.445 +0.322 +0.445 —0.707 | ’ 
—0.071 —0.841 +0.537 0.000 


—0.199 +0.376 +0.905 
U = {| —0.591 —0.783 +0.195], V= 
—0.782 +0.496 —0.378 


and singular values 01 = 8.367, 02 = 0.931, 03 = 0.363. (The SVD is typically computed using 
mathematical software. The above were computed using the MATLAB/OCTAVE code [U,S,V]=svd(A);) 
We consider a rank-1 approximation of A given by Aj = ou vi. (In Exercise 7, the reader is asked to 
show that such an outer product is, indeed, rank 1.) We have || A|| 7 = 8.544 and from Theorem 7.5.11, 
the error in the approximation satisfies || A — Aj||-7 < 02 +03 = 1.294. We find that 


+1.287 —0.741 +0.741 +0.118 
Aj = oju1v| = | +3.826 —2.203 +2.203 +0.349 
+5.059 —2.912 +2.912 +0.462 


And, in fact, ||A — A1|| 7 = 0.999. We can also provide a rank-2 approximation for A: 
+1.185 —0.854 +0.854 —0.176 
Az = Ay + onu2v5 = | +4.040 —1.968 +1.968 +0.962 | , 
+4.923 —3.061 +3.061 +0.074 


where ||A — A2|| 7 = 0.363. Finally, we note that because A is a rank 3 matrix, the rank three 
approximation is exact, A3 = A. 


In addition to approximating matrices, we can also determine approximations of linear transformations 
through their matrix representations. This is particularly useful if the transformation has a nullspace 
of large dimension compared to its rank. 

The matrix approximations of Theorem 7.5.11 can be applied to any two-dimensional array 
of numerical values. Grayscale images, as an array of pixel values, can be treated as a matrix 
and represented in compressed form using the singular value decomposition. RGB images can be 
compressed by compressing the three channels separately. 


Example 7.5.13 (Data Compression) Consider the bottom right sub-image of the seedling tree in 
Figure 7.22. This image is a pixel array using 8-bit integer values (values between 0 and 255) in red, 
blue, and green channels. Let M“), M and M8) be the 772 by 772 matrices containing the red, 
blue, and green 8-bit integer values. The values in each channel can be approximated with a rank-s 
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approximation through the singular value decomposition. In particular, we compute 
AY 
MM) = ae ue? yer = G® pe yor, 
k=1 


where the singular value decomposition of M is DOSMOVOT (and similarly for the blue and green 
channels). The resulting RGB approximate image is defined by the values in MO, mM), and M®, 
The rank-s SVD approximations are not necessarily 8-bit integer values. 

Example rank-s compressed images are shown in Figure 7.22 for s = 8, 16, 32, 64, 128. The image 
arrays have rank-772 and rank-875, respectively. Notice that even the rank-8 approximation captures a 
good proportion of the larger shape and color features. The rank-16 approximations are recognizable 
scenes and require less than 5% of the memory storage as the full image. The rank-128 approximations, 
which are visually indistinguishable from the full image, require less than one-third of the memory 
storage. 


7.5.2 Computing the Pseudo-Inverse 


The singular value decomposition of a matrix T identifies an isomorphism between the row-space, 
Null(7)+, and column-space, Ran(7), of a matrix. In particular, we have the pseudo-inverse theorem. 


Theorem 7.5.14 [Pseudo-Inverse] 
Let M be anm x n real matrix of rank r with singular value decomposition M = UV". Then 
for all integers 1 < 5 <r, 


P, = V,5>'U1 
is the unique 7 x m real matrix of rank s such that P; Mx = x for all x € Span{v}, v2,--- , vs}. 


Proof. Suppose M is an m x n real matrix of rank r with singular value decomposition M = UV". 
Let 1 < s <r. Then for x € Span{vj, v2,--- , vs}, there are scalars, a1, a@2,..., @s so that 


X= Q1Vj +Q202 +... + ss. 
Assume that vj, v2,..., Uy, are the columns of V,u1, u2,..., Uy are the columns of U, and that 01, 02, 


. ++, Omin{m,n} are the nonzero diagonal entries of &. Then 


UX = 


+ Jo ifl<k<s 
0 ifk>s 


Therefore, we have 
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Fig.7.22 Illustration of image compression using the singular value decomposition. The uncompressed images M are 
shown at the bottom right of each image sequence. The other five images in each sequence are compression examples 
Ms using s singular vectors/values. The s values are given in the corner of each image. The RGB color channels of these 
images have been compressed independently. 
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Therefore, P; satisfies P; Mx = x. The uniqueness of P; follows from Corollary 4.2.27. 


The matrix P, is the pseudo-inverse that we seek. The pseudo-inverse matrix is written as a product 
of three matrices. Revisiting the ideas that led to Equation 7.9, for the transformation M = UXV! of 
rank r we can write 


s T 
~n ~ UKUu 
P, =V,=>'Ul = Y esa 34 (7.10) 


for all 0 < s < r. Here, we give the formal definition of the pseudo-inverse of rank s. 


Definition 7.5.15 


The matrix P; of Theorem 7.5.14 is the rank s pseudo-inverse of M@. When r = Rank(M), the 
rank r pseudo-inverse, P, is called the pseudo-inverse of M and is denoted, simply, P. 


Let us now consider a couple of examples of pseudo-inverse computations. 
Example 7.5.16 Consider the matrix 


42226 
11215 
23443 
52124 


This 4 x 5 matrix is non-invertible and has rank r = 3. The singular value decomposition A = UXV" 
provides the pseudo-inverse 


+0.0296 —0.1584 —0.0455 +0.2027 
—0.0190 —0.0543 +0.0856 +0.0258 
P=V,>,U! ~ | —0.0268 +0.0498 +0.1350 —0.0899 
—0.0390 —0.0724 +0.1413 +0.0139 
+0.0870 +0.1900 —0.0903 —0.0596 


The pseudo-inverse matrix is 5 x 4. The reader should check to see that PA #4 I5 and AP # Ij. 
Consider the three vectors 
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2 —2 1 
1 3 1 
U’i= 1 », YU= —4 » Y= 1 
1 2 1 
3 1 1 


The reader can also verify that vj € Row(A) and indeed, PAv; = v1. Notice that v2 € Null(A), that 
is, Av2 = 0. So P(Av2) = 0. Thus, v2 is not recoverable from the pseudo-inverse transformation. For 
the third vector, we find 


0.998 

18) fogs 

PAv3 = P ~ | 0.970 
16 

1.108 

1.014 


We see that the result of PAv3 is only an approximation to v3 and v3 is not exactly recovered from 
Av3 by P. This means that v3 ¢ Row(A). 


Example 7.5.17 Consider the matrix representation, T, of the transformation D : ?3(R) > P2(R) 
defined by D( f(x)) = f’(x). Using the standard basis for P3(R), the reader should verify that 


0100 “ean 
T={0020], andP= 

0003 re 

00 ', 


From calculus, we expect that P should represent an antiderivative transformation. Let f(x) = a+ 
bx + cx* + dx? be an arbitrary vector in P3(R) where a, b, c,d € R. For notational simplicity, we 
will use [v] to indicate a coordinate vector according to the standard basis. We have 


a 0 
b b 
Lrool=| 2]. TLeol= | 2c], and Pr (fool = |? 
d o d 


The pseudo-inverse integration transformation does not recover the constant of integration a because 
the derivative transformation is not invertible. However, if f(x) € Span{x, xe }, then PT [f(x)] = 


Lf (x). 


In general, we have found that if P is the pseudo-inverse of M, then PMx 4 x. If the pseudo- 
inverse is to be used to recover, or partially recover, transformed vectors, it is important to understand 
the relationship between P Mx and x for all x. 


Corollary 7.5.18 

Let M be an m x n real matrix of rank r with singular value decomposition M = UXV". Let 
Xs = Span{v1, v2,--- , vs}. Then PsMx = projy x = V, Vix for all x € R” and all integers 
l<s<r. 


7.5 Singular Value Decomposition 469 


The proof is similar to the argument in the proof of Theorem 7.5.14 applied to an arbitrary vector 
x € R”. The proof is the subject of Exercise 12. 


Example 7.5.19 Recall Example 7.5.17. The derivative transformation matrix T has rank r = 3. 
Let X = Row(T) = Span{[x], [x7], [x3]}. We also have Null(T) = Span{[1]}. For arbitrary f(x) = 
a+bx + cx? + dx} € P3(R), we have 


projy Lf (x)] = [bx + cx? + dx*] = = PTLf@)l. 


Qaeroe 


Example 7.5.20 Recall the example matrix A in Example 7.5.16. Let X = Row(A). Because v; € X, 
projyvy; = vy and PAv; = projy vy = vy as previously demonstrated. Also, Corollary 7.5.18 shows 


us that 
0.998 


0.882 
projyv3 = PAv3 © | 0.970 
1.108 
1.014 


Example 7.5.21 Recall the radiographic transformation T of Example 7.5.7. We found that Row(T) = 
Span{v;, v2, v3} and Ran(7T) = Span{u,, u2,u3}. The pseudo-inverse is the transformation 
P : Ran(T) — Row(T) defined as 


5 =3 3-1: SI 

iias ee Lj] 3: 3 =-f=3 5 
r" 4213 -1-1 3 5 -3]’ 

=3°5. 1. 3 13 


where r = 3, the rank of 7. And, 


3111 
1/1311 

PE = Vi i34 
1113 


Now, consider the object vector x = (1 23 4)", which transforms to radiograph vector y = Tx = 
(‘hp 3h 37 Mh 9p)" The reconstructed object vector is Py = (1 23 4)" = x. So, x must be a vector 
in the row space of 7. In fact, x = U1 +0v2+ 503. 

As asecond example, consider the object vector x = (1 22 2)", which transforms to the radiograph 
vector y= Tx = (3 4347/, 1p)". The reconstructed object vector is Py = (5/4 Ta 7/4 f4)" # x. So, 
x must have a nullspace component and PT x is the projection of x onto the row space of T. In this 
case, the actual object desciption x is not exactly recoverable from the radiograph y. 
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‘= Path to New Applications 

Data Analysts use principal component analysis (PCA) to find a small set of vectors that describe 
their data well, reducing the dimension of the problem. These principal vectors are merely 
translated singular vectors. The principal vectors then form an orthonormal basis for an affine 
subspace (a set formed by translating a subspace away from the origin) that describes the data. 
That is, the principal vectors form a basis for the subspace prior to translating. See Section 8.3.2 


to learn more about PCA and other tools for data applications. 


7.5.3 Exercises 


Compute the singular value decomposition and pseudo-inverse of each of the matrices in Exercises 1-6 
using the methods illustrated in the Examples. 


111 
swe (12), 
01 

4,.M= {12 
23 
10 
5 w= (!9) 
1-10 1 
6M=;-2 2 0 1 
0 0 -I1-!I1 


Additional Exercises 


. Show that if wu € IR” and v € R” are both nonzero, then the corresponding m x n matrix given 


by the outer product, wv! has rank 1. 


. Show that Equation 7.9 holds for the case r < m <n. 
. Argue that the singular value decomposition of a matrix is not unique. 


10. Let M be a real-valued matrix. Prove that if M~! exists, then M—! is equal to the pseudo-inverse 
of M. 

11. Suppose matrix M has singular value decomposition M = UV". Find a singular value 
decomposition for M™. 

12. Prove Corollary 7.5.18. 

13. Consider Corollary 7.5.18. Show that, for transformation T with singular value decomposition 
T=UXV', pseudo-inverse P, and rank r, PT = V, V 

14. Find the rank s pseudo-inverses (for all possible s) of the transformation from Section 7.4. 

15. Consider the radiography Examples 7.5.7 and 7.5.21. Suppose the (unknown) object is 


if 1 
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(a) Using the standard bases, determine [x] and the radiograph [y] = T[x]. 

(b) Compute [x], the object reconstructed from [y] using the pseudo-inverse transformation. 
(c) Show that [x] and [x] differ by a null vector of T. 

(d) Use vector norm concepts to measure the similarity between [x] and [x]. 


16. Determine a singular value decomposition for the general heat diffusion transformation. (Recall 
that the transformation is symmetric.) 

17. Consider Equation 7.10. Compute the rank-1 through rank-r pseudo-inverse matrices of the 
transformation matrix of Exercise 6. 


7.6 Explorations: Pseudo-Inverse Tomographic Reconstruction 


We now have the linear algebra tools for performing brain image reconstructions from radiographic 
data. We have seen that the radiographic transformation T : RY — R™ may not have ideal properties 
such as invertibility. In fact, it need not be one-to-one, nor onto. Making use of the singular value 
decomposition of 7, we can construct a pseudo-inverse transformation P, which is the isomorphism 
between Ran(T) and Null(7)+. We have seen that this means PT x = x for all x € X = Null(T)+. 
So, if we are given a radiograph y, the pseudo-inverse reconstruction produces * = Py. Our goal 
is to determine a pseudo-inverse procedure so that * is a “good” approximation of x. This section 
brings together many linear algebra topics. Completion of this exploration requires some synthesis 
and creative thought. We encourage the reader to be prepared for both. We hope the reader will find 
enjoyment in this last activity. We expect that such an activity can be enjoyed as a project extending a 
linear algebra course using this text. 

In order to complete this exploration, you will need the following OCTAVE/MATLAB code and data 
file: 


e tomomap.m 
e Lab6Radiographs.mat 


Here, we explore a radiographic scenario that uses 108 by 108 voxel brain image layers and 30 
radiographic views of 108 pixels each. (Recall your answer to Exercise 2 of Chapter 1.) The radiographic 
views are at equally spaced angles beginning with 0° and ending at 174°. The object voxels and image 
pixels have the same width. 


7.6.1 The First Pseudo-Inverse Brain Reconstructions 


We begin by constructing the pseudo-inverse transformation corresponding to the radiographic 
scenario. Then, we apply this transformation to our radiographic data. We refer the reader to Section 4.1 
to be reminded of the appropriate use of tomomap.m. 


1. Construct the transformation T using the tomomap function. What is the size of T? 

2. Next, we will carefully construct the pseudo-inverse P using the singular value decomposition 
T = UXV". We do not need the full decomposition of T to perform the reconstruction (See 
Theorem 7.5.11). In this exercise, we obtain the decomposition T = USV". The matrices U ‘ =, 
and V' are obtained using the svd command. 


>> [U,S,V]=svd(T, ’econ’); 
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This computation may take a few minutes for your computer to complete. The second input 
argument asks the svd command to only return the matrices U, S and V, which are called U, S, 
and V for clarity in the code. The inverse of = can be directly computed because it is diagonal: 


>> SI=diag(1./diag(S)); 
The pseudo-inverse is then computed as 


>> P=V*SI*U'; 


(a) What is the rank of 7? (The numerically stable method for finding rank of a large matrix is 
by observing the singular values. What kind of observation should you make here?) 

(b) What is the dimension of Null(T)+? 

(c) What is Nullity(7)? 

(d) Is T one-to-one? Onto? 


3. Now, load the data file Lab6Radiographs.mat that contains the radiograph data as a matrix B. 
The first 181 columns of B are noise-free radiographic data vectors for each brain slice. The next 
181 columns of B are low-noise radiographic data vectors for each brain slice. 


>> load Lab6éRadiogaphs.mat; 


Next, compute and view the pseudo-inverse reconstructions for both noise-free and low-noise 
radiographic data. All 362 reconstructions are obtained through a left multiplication by the 
pseudo-inverse operator. 


>> recon=P*B; 


Use the following lines of code to display the noise-free and low-noise reconstructions for a 
single brain slice. 


oe 


>> slice=50; choose a slice to view 


>> brainl=recon(:,slice); % noise-free recon 
>> brainl=reshape(brain1,108,108); 
>> brain2=recon(:,slice+181) ; % low-noise recon 


>> brain2=reshape(brain2,108,108); 

>> figure(’position’,[0 0 1200 450]); 
>> imagesc([brainl brain2]); 

>> set(gca,’visible’,’off’); 

>> caxis([-32 287]) 

>> colormap (gray) ; 

>> colorbar(’fontsize’,20); 


oe 


figure size 


oe 


don’t draw the axes 
set color scale 


oe 


The figures that you get show the slice reconstruction from noise-free data (on the left) and from 
noisy data (on the right). 

Observe and describe reconstructions of several slices of the brain. You need not choose slice 50, 
choose any number between | and 181. Note that choosing slices above 160 (slice 1 is near the 
chin and slice 181 is above the top of the skull) are not all that interesting. Recall that images in 
this vector space are expected to be 8-bit grayscale (values from 0 to 255). 


You have now computed brain slice reconstructions! Some do not look that great. We see that the 
noisy radiographs did not give useful reconstructions. Our next step is to figure out how to attempt 
reconstructions from noisy data. 
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7.6.2 Understanding the Effects of Noise. 


We can understand why the low-noise reconstructions have such poor quality by examining the details 
of the transformation P. Consider Equation 7.10 (with s = r = rank(T)): 


r T 

~~~ UKUu 
Psyc US) k 
k=1 


Reconstruction of a brain slice image is obtained by left multiplication by P. Let y be a noise-free 
radiograph and b a low-noise radiograph of the same object. These differ by an unknown noise vector 
n. That is, assume b = y + 7. 


4. Expand Pb to be of the form x + x, using the representation for P above and our assumptions 
that b = y + 7. Here, we use x to be the noise-free reconstruction and Xy is the contribution to 
the reconstruction due to the noise. 

5. Explain the nature of your initial brain slice reconstructions in terms of how P interacts with 
n. (Consider how it might occur that the contribution of the reconstruction due to noise might 
create the effects you saw above. In particular, what properties of 7 and of T (seen in x) might 
be reasons for x, to obscure 7) 

6. Test your ideas by examining the individual terms in your expansion for some particular 
radiograph slices. For example, let b = recon(:, slice + 181). 


7.6.3 A Better Pseudo-Inverse Reconstruction 


The pseudo-inverse transformation suffers from noise amplification along some directions vg. We can 
understand why this happens based on the size of the singular values ox of T. If a particular o; is very 
small, then any noise contribution along uz results in an amplified reconstruction contribution along 
vy—possibly dominating the noiseless part of the reconstruction. We can alleviate this problem by 
carefully crafting alternative or approximate pseudo-inverse transformations. 


7. In order to understand a strategy for reconstruction from noisy data, we want to consider how 
the noise might have a contribution along some ux. We begin with an understanding of noise. 
Create a list of noise-like properties that a vector might have. Do any of the basis vectors ux have 
noise-like properties? 

8. As we stated above, a larger contribution in a particular direction can occur based on the size of the 
singular values. Create a visualization of the singular values of T that will be useful in determining 
their relative size. 

9. Of course, if we knew exactly what the data noise 7 is, we would just remove it and reconstruct as 
if there was no noise. But, 7 is unknown and cannot be removed before attempting a reconstruction 
of the brain slice. How might one utilize the known information contained in the singular value 
decomposition to reduce noise amplification? Devise one or more modified pseudo-inverse 
transformations that are not as sensitive to noise amplification. Recall the study of pseudo-inverse 
construction in Section 7.5. 

10. Use your modified pseudo-inverse to perform brain reconstructions on the low-noise radiographs. 
Compare your new reconstructions to the pseudo-inverse reconstructions obtained from the noise- 
free radiographs. Some example reconstructions of slice 50 are shown in Figure 7.23. These 
reconstructions use modified pseudo-inverse transformations applied to the low-noise data. 

11. Next, explore how one might choose a “best” modified pseudo-inverse. 
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( Example modified pseudo-inverse reconstructions of brain slice 50. Reconstructions are inverted from low- 
noise radiographic data. 


(a) Suppose, for the sake of argument, that the actual brain slice description x is known. How 
would you determine the best pseudo-inverse transformation for recovering x from noisy 
data b? Remember, the noise vector 7 is unknown. You might consider how you would 
measure how well the reconstruction approximates the actual brain slice. 

(b) Now, suppose more realistically, that x is unknown, but the magnitude of the noise, 6 = ||7|I, 
is known or can be reasonably estimated. How would you determine the best pseudo-inverse 
transformation for recovering x from noisy data b? 


12. Using some of the tools found in Questions 9 and 11, find “good” pseudo-inverse reconstructions 
for several brain slices. In this exercise, both the actual object, x, and the magnitude of the noise, 
B, are unknown. Comment on the quality of the reconstructions and estimate the magnitude of 
the noise vector. It is important to recognize here that you are exploring, using ideas above, to 
adjust the pseudo-inverse and find the corresponding reconstruction. Your goal is to gain more 
information about how a “good” reconstruction can be obtained. 


7.6.4 Using Object-Prior Information 


We have seen that the pseudo-inverse transformation P : Ran(T) > Null(T)+ is an isomorphism, 
where T has rank r. So, if b € Ran(T), then x = Pb is the unique vector in the domain of T for which 
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Tx = b. Also, if b ¢ Ran(T), say b = y + w where y € Ran(T) and w € Ran(T)+, then x = Pb = 
Py is the unique vector in the domain of T for which Tx = y. That is, w € Null(P). 

In a tomographic situation, the radiographic data b may be noisy, say b = y + n. Ifn € Ran(T)+ = 
Null(P) then it does not affect the reconstruction. In this case, we obtain the desired result Pb = Py = 
x. Butif 7 has acomponent in Ran(T), say 7 = ny + Nw where ny € Ran(T) and ny € Ran(T)+, then 
itis possible that ||x || < || P71» || so that our reconstruction is dominated by noise effects. And, since we 
do not know 7 (or 7), it is difficult to obtain meaningful brain images. 

The singular value decomposition of T and the rank-s pseudo-inverse P; provided a means of 
reducing the effects of noise amplification allowing us to obtain meaningful brain image reconstructions. 
However, deciding how to choose an appropriate P, is unclear unless one has some knowledge of the 
noise vector, such as its magnitude. 

In our next task, we consider additional methods for improving the tomographic process. We may 
also have some knowledge about the properties of the unknown object. 


13. Choose two or three of the best reconstructions you have found in the previous exercises. Consider 
reconstructions from both noise-free and low-noise radiographs. List and justify reasons why you 
know that these reconstructions are not accurate. 

14. The radiographic transformation T from which the pseudo-inverse transformation P was obtained 
is not one-to-one. List properties (and their radiographic interpretations) of transformations 
that are not one-to-one. Discuss how one or more of these properties may assist in finding a 
more realistic brain image from the pseudo-inverse reconstruction. It may be useful to review 
Sections 4.6 and 4.7. 

15. Consider the specific goal of finding a brain slice image x + w where Pb= x, b is the 
radiographic data, w € Null(T), and the entries of x + w are nonnegative. 


(a) Would it make sense for entries of x + w to be negative? 

(b) What does the size of Nullity(7) tell us about the type/amount of object detail found in 
Null(7)? 

(c) How might a suitable null vector w be determined using linear algebra tools? 

(d) To obtain a reconstructed object that produces the radiographic data we are given, we want 
to be sure we are changing the reconstruction using “invisible” object data. How do we 
make sure that we are making adjustments close to our desired corrections while keeping 
the adjustments “invisible?” 


16. Suppose we have a candidate (non-null) correction vector z ¢ N = Null(T’). We do not have a 
basis for V’. So, how can we compute the the closest null vector w = proj yz? See Theorem 7.2.19 
and the subsequent Corollaries 7.2.20 and 7.2.21. Keep in mind that we have only the transformation 
T along with U, Sand V. 

17. Choose two brain reconstructions of the same slice—one from noise-free data (x), and another 
from low-noise data (x2). Using the results of questions 15 and 16, and for both reconstructions: 


(a) Compute correction vectors w; and w2, 
(b) Verify that w2 and w2 are null vectors of T, 
(c) Compute the corrected reconstructions x; + w, and x2 + w2. 


Carefully examine the corrected reconstructions and draw some conclusions about the success of 
the procedure. In particular, how well did the corrections w; and w2 “fix” the known problems 
in x; and x2? 
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Example brain image reconstructions of slice 62 (upper set of images) and slice 140 (lower set of images). 
The first row of each set are reconstructions from noise-free radiographs and the second row from low-noise radiographs. 
The columns correspond to the pseudo-inverse, the rank-2950 pseudo-inverse, the first null-space correction, and the 
iterative null-space correction, respectively. 


18. How might you further improve your corrected reconstructions using an iterative procedure? 
Apply your procedure to the two reconstructions of question 17 and discuss the results in detail. 
Figure shows example results for brain slices 62 and 140. 

19. The current set of explorations has focused on the physical reality that brain densities cannot 
be negative. Describe several other prior knowledge conditions that could be applicable to the 
tomography problem. Describe several other types of linear inverse problems that could benefit 
from prior knowledge null-space enhanced corrections. 
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7.6.5 Additional Exercises 


1. Plot several (left) singular vectors uz of T some which correspond to small singular values and 
some to large singular values. Which k values appear more prone to noise amplification in a 
pseudo-inverse process? Justify your conclusions. 

2. Compute several null-vector enhanced brain slice reconstructions using the prior knowledge that 
the reconstructed density image, p, must have values that lie in a given range: [0, Pmax]. 

3. Suppose object densities are known to come from a finite specified list {01, 02, ..., Pm}. How can 
you implement this prior knowledge in a null-vector enhanced reconstruction scheme? 

4. A fiducial object is an object of known geometry and density used in radiographic testing. Suppose 
one or more fiducial objects are placed in the object space during the radiographic process. Describe 
how the fiducial information can be used in a null-vector enhanced reconstruction scheme. 


m) 


Check for 
updates 


Conclusions 


We began with two main application ideas: Radiography/tomography of brain scan images and heat 
diffusion for diffusion welding and image warping. In this chapter, we would like to recognize what 
we did and how we move forward. 


8.1 Radiography and Tomography Example 


In this text, we considered the example of brain scan tomography. Given a radiograph made from 
several views of an object, we want to find the object that produced such a radiograph. See Figure 8.1. 
In the case of brain scans, we recognize that the more views used, the more radiation the patient 
experiences. To limit such exposure, our goal was to find good reconstructed brain images from few 
views. We recognize that the radiographic transformation is, in general, not one-to-one. This means 
that every radiograph can be produced from many different objects. We saw that the two objects in 
Figure 8.2 produce the same radiograph, though there are subtle differences. This led us to ask which 
object, among infinitely many possibilities, produced the given radiograph. 

Because no inverse exists for the radiographic transformation 7, we sought a method for “inverting” 
the transformation. Really, we were just looking for one solution to Tx = b. Using orthogonal 
coordinates, we were able to simplify the pseudo-inverse computations. After finding a solution, we 
realized that just any solution was not good enough, especially in the event that the radiographic 
data included measurement noise (see Figure 8.3). In the end, we were able to create a pseudo- 
inverse algorithm that gave good approximate reconstructions, relying heavily on the linearity of the 
radiographic transformation and the idea that all objects that produce the same radiograph differ by 
nullspace objects. Determining a desired adjustment vector in the nullspace via orthogonal projection 
provided a new reconstruction that we found to be better. In the end, we produced good approximate 
reconstructions (see Figure 7.24). 


Fig. 8.1 How do we reproduce the brain slices (right) from the radiographs (left)? 
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3 Left: Reconstructed brain image slice, from noise-free data, using a pseudo-inverse. Right: Reconstructed brain 
image slice, from low-noise data, using a pseudo-inverse. 


The problem of object reconstruction arises in many applications and is a current problem in image 
analysis. We have just scratched the surface in this field of study. There is still much to do to create 
good reconstructions. We encourage the interested reader to explore more. 


8.2 Diffusion 


In this text, we also considered the example of heat diffusion for diffusion welding. In particular, 
we sought a method for describing long-term behavior as heat diffuses along a long thin rod. In our 
exploration, we found particular heat states that diffused in a way that only scaled the vector (see 
Figure 8.4). More importantly, we found that these eigenvectors formed a basis for the diffusion 
transformation. This was an exciting find as we were able to describe the long-term behavior without 
repeated application of (multiplication by) the diffusion matrix. Instead, we used the powers of the 
eigenvalues to describe long-term behavior in two ways: 
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Fig.8.4 An eigenvector of the diffusion matrix diffuses by scaling in amplitude only. 


e First, we were able to create a diagonalization of the diffusion matrix, E using eigenvectors 
V1, U2,-.-, U, as columns of a matrix V and the corresponding eigenvalues 1, A2,..., An as 
diagonal entries of the diagonal matrix D, giving for any time step k 


h(k) = E*h(0) = VD‘ V—'h(O). 


Because D is diagonal, D* is computed by raising each diagonal entry to the kth power. 
e We also viewed the heat state at the kth time step by writing it as a linear combination of the 
eigenvectors as follows: 


h(O) = qyvy +agveat+...Qnvp. 


h({k) = ay Muy + azAv2 +...+ On rk vn. 


Again, this computation is simplified because we need only raise the eigenvalues to the kth power. 
But, more interesting is that this representation shows that the larger the value of A, the more 
contribution to the kth heat state comes from vg. 


Simplifying the computations gives us a means for predicting the best time to remove a rod from the 
diffusion welder. In other Markov processes, we see a similar ability to predict the state of the process 
at a particular iteration. 


8.3 Your Next Mathematical Steps 


We promised in Chapter | that you would be prepared to use linear algebra tools in other areas of 
mathematics. Below, we provide brief discussions to connect linear algebra tools to a few (of many) 
areas of active research. We hope that it will encourage you to seek out more detailed resources and 
continue your investigations. 
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8.3.1 Modeling Dynamical Processes 


Modeling dynamic processes means that we are finding functions based on how variables change 


; : ae 4 @® : 
with respect to others. From calculus, we know that this means we have derivatives like 7 <r or partial 
gn 
derivatives like 2 > . Observations and various scientific laws instruct scientists, economists, 
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and data analysts about how to form differential equations that describe how a function changes based 
on the independent variables. A differential equation is an equation where the unknown is a function, 
and the structure of the equation involves terms that include the function and its derivatives. We saw a 
partial differential equation related to heat diffusion given by (in 3D) 


3 
ay dy 
cVy Vey =e) a5 ae 
= Ox; Ot 


In solving this equation, the goal is to find a function y : R* — R that has first and second partial 
derivatives that satisfy the above equation. Notice that this equation says that the value of y changes 
over time in a way that is related to how y curves in the x; directions. 

Researchers interested in providing a tool for early detection of tsunamis create partial differential 
equations (and sometimes systems of partial differential equations) using information about how the 
wave speed and shape changes due to the impedance created by the topography of the ocean floor, 
currents, wind, gravitational forces, energy dissipation, and water temperature. Partial differential 
equations are also used to characterize and predict vegetation patterns. Such patterns arise as the 
result of resource competition and various climatic and terrain conditions. Similar partial differential 
equations can be used to describe skin coloration patterns in mammals such as zebras and leopards. 

Based on the properties of differentiation given in calculus, we learned that the derivative operator is 
linear. One can then find eigenfunctions f; for the differential operator D (Df, = Af;,) and construct 
solutions based on initial values and boundary conditions. Approximate solutions can be found by 
discretizing the function and its derivatives thereby creating matrix equations. One can then apply the 
various techniques given in the invertible matrix theorem (Theorem 7.3.16) to determine the existence 
of solutions. 


8.3.2 Signals and Data Analysis 


In this text, we have seen data in the form of images. But, we hear about data that are collected 
through financial and business, political, medical, social, and scientific research. Data exist on just about 
everything we can measure. As we mentioned in Section 1.1.2, researchers are interested in determining 
ways to glean information from this data, looking for patterns for classification or prediction purposes. 
We also discussed how machine learning is a tool to automate these goals. Here, we will connect the 
aforementioned machine learning tools with the necessary linear algebra tools. 

Fourier analysis is used to recognize patterns in data. The Fourier transform uses an orthonormal 
basis (of sinusoidal functions) in an infinite dimensional function space to rewrite functions in Fourier 
space. Function inner products are used to determine which patterns are most prevalent (larger inner 
products suggest a larger level of importance in the data). Discrete Fourier analysis is similar, but with 
vector inner products. 

Regression analysis is a tool for predicting a result based on prior data. Given a set of user supplied 
functions, {f; : R” > R|k=0,1,...,m}, a regression analysis seeks to find a vector a* € R” 
so that for given data, {(d;, u;) | x; € R", uj €¢ R,i = 1,2, ..., N}, the vector a* = (aj, aj, ..., a), 


8.3 Your Next Mathematical Steps 483 


minimizes the total least squares deviation LSD, 


N m 2 
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The equation u = )°7.9 ax fx(x) is the regression equation used for prediction. Using multivariable 
calculus, LDS minimization can be reduced to solving a matrix equation. We note that if the set of 
user-supplied functions is not linearly independent, then there is no unique minimizing vector a*. By 
choosing a linearly independent set of functions, we recognize that we are finding the coordinate vector 
from the respective coordinate space that is closest to the given data. On the other hand, if your data 
fit a function that is not in the span of the set of supplied functions, the least-squares deviation could 
be large, meaning that the regression may not be an effective predictor. 

Data clustering is a tool that uses relatively few feature vectors {v1, v2,..., vg} to classify vectors 
into k clusters. In general, a data vector u belongs to cluster j if u is more similar to vector v; than any 
other v;, where i ~ j. Similarity can be defined in many ways. One such way is using inner products. 
The challenges are finding the “best” feature vectors and an optimal number of these “best” feature 
vectors. 

In the event that a researcher has collected data that can already be classified, yet no choice for 
feature vectors is possible, support vector machines use the clustered data to classify any new data 
points. For example, suppose scientists have measured eight key physical and biological attributes for 
a large sample (training set) of kittens and have regularly monitored their health for their first 5 years. 
Kittens that developed kidney problems during this time represent one cluster. Kittens did not represent 
the other cluster. Each kitten is represented as a data vector in R®. Veterinarians then use a support 
vector machine to predict the likelihood of a kitten developing kidney problems long before any issues 
arise. The way that support vector machines work is to transform (using a nonlinear transformation) 
the set of kitten vectors from R® to R” for some n > 8 in which data clustering is performed using 
inner products as described above. The challenges exist in finding the appropriate transformation and 
the appropriate n so that the classification tool is robust, minimizing both false positives and false 
negatives. 

Principle component analysis (PCA) is used to find relevant features or structures in data. This 
technique is a simple extension from singular value decomposition (SVD). In fact, the only difference 
is that the data need not be centered around the origin for principal component analysis. The first and 
last step in PCA are translating the data so that it is centered at the origin and then back after finding the 
SVD. Now, since SVD finds the subspace that most closely represents the data, PCA finds the affine 
subspace (a subspace translated from the origin) that most closely represents the data. The vectors 
obtained after translating the singular vectors back are called the principal components. Notice that an 
affine subspace is not actually a subspace, rather it is a translation of a subspace. 


8.3.3 Optimal Design and Decision Making 


Optimal Design is the process of determining parameter values that describe a best product design 
among all possible valid designs. For example, in designing a rudder for a new sailboat, one might 
wish to maximize the responsiveness. The design is specified by several geometric and material choice 
parameters. The parameter choices are limited by weight, clearance, manufacturing capability, financial 
considerations, etc. 
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Optimal decision making is the process of determining the best strategy among all possible strategies. 
For example, in planning the employee work schedule for the upcoming week, a company would like 
a Strategy that minimizes paid wages. Some considerations that limit the company’s choices include 
ensuring adequate employee presence, allowing for time off, appropriate matching of employee tasks 
and skill sets, and assigning tasks that do not degrade morale. Other considerations might include 
limited employee availability or limited overtime. 

These types of optimization problems, and myriad others, are modeled mathematically as an 
objective function f : R” — R, which quantifies the quality of a particular choice of variables, and 
constraint equalities h(x) = 0 (A : R” — R”) and inequalities g(x) < 0 (g : R” — R?), which limit 
the choices of possible variable values. The idea is to find the valid choice (satisfy all constraints), which 
maximizes (or minimizes) the objective function. Typically, there are infinitely many valid choices, 
but only one optimal choice. 

If the model functions, f, g and h, are linear, then we expect that linear algebra will be a key tool. 
In general, there are infinitely many valid solutions defined by the solution set of a system of linear 
equations, and optimal solutions are obtained by examining the basic solutions of the system. For a 
system with m equations and n unknowns, the basic solutions are found by setting n — m variables (free 
variables) to zero and solving for the remaining (basic) variables. Invertibility properties of submatrices 
of the coefficient matrix lead to the presence or absence of basic solutions. This is the study of linear 
programming. 

Even when optimization problems involve nonlinear objective and constraint functions, linear 
algebra is a necessary tool. Efficient methods for solving these types of problems involve local quadratic 
function approximations. Suppose we have quantitative derivative information of the function at x = a, 
then the quadratic approximation of f near a is 


1 
f(x) ® f@+Vf@" x —a)+ 5 = a)'V? f(a)(x —a), 


where V f (a) is the vector of first partial derivatives, the i th entry of which is [V f(a)]; = of (@) and 


xi? 


V? f(a) is the Hessian matrix of second partial derivatives, the i, j-entry of which is [2 tf (a)| i 


Uj = 
2. 
as Function and gradient evaluations involve linear algebra techniques and the eigenvalues of the 
DOR] 
Hessian matrix reveal information about the curvature of the function. Together, V f and V7 f, provide 
specific information on solutions. For example, if the gradient is the zero vector (V f (x*) = 0) and all 
eigenvalues of the Hessian are nonnegative (V* f(x*)v = Av and v 4 0 implies \ > 0), then x* is a 


local minimizer of f. 


8.4 Howto move forward 


In this text, we explored particular applications for which Linear Algebra topics presented solution 
techniques. Solution paths for these applications followed a fairly typical procedure. That is, the path 
we took in this book is very similar to the path an applied mathematician takes to solve other real-world 
problems. We recap by describing our procedure very simply as 


1. Explore the data, looking for characteristics that suggest the use of particular techniques that we 
can employ. 

2. Create simplified and smaller data sets (called “toy data’) with similar characteristics (starting with 
fewer characteristics than are actually present in the data). 
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3. Apply mathematical techniques that were considered in Step | on the toy data. Adjust the current 
technique so that it can be used on the actual data. 

4. Determine what shortcomings we find in the current technique and what how the technique moves 
us toward our goals. 

5. Explore creative and new techniques that might be used to overcome the shortfalls of the previous 
techniques. 

6. Add more richness to the toy data and continue the same process with the new toy data and new 
technique ideas. At each iteration, increase the complexity of the data until we find a “good” solution 
to our original problem with the actual data. 


These techniques are valid for many other applications such as those mentioned in Section 8.3, and 
even for applications that are yet to be discovered. Willingness to explore and play with data is key to 
finding novel solutions to real-life problems. 


8.5 Final Words 


There is still much to do with the applications that we focused on in this text. The interested reader is 
encouraged to explore newer techniques that create even better solutions. Further validation for these 
methods is also needed. The methods encouraged by the exploration in Section 7.6 have been used 
with many applications, but the validation and refinement of the method, at the time of publication, is 
still an open problem. 

The interested reader is encouraged to seek out texts that delve deeper into applications that we 
discussed. We encourage you to talk to experts and become part of a community looking for solutions 
to many of these problems. We encourage you to find new applications and new uses for the tools of 
linear algebra. 


Appendix A 


Transmission Radiography and Tomography: 
A Simplified Overview 


This material provides a brief overview of radiographic principles prerequisite to Section 4.1. The 
goal is to develop the basic discrete radiographic operator for axial tomography of the human body. 
To accomplish this goal, it is not necessary to completely understand the details of the physics 
and engineering involved here. We wish to arrive at a mathematical formulation descriptive of the 
radiographic process and establish a standard scenario description with notation. 


A.1_ What is Radiography? 


Transmission radiography and tomography are familiar and common processes in today’s world, 
especially in medicine and non-destructive testing in industry. Some examples include 


e Single-view X-ray radiography is used routinely to view inside the human body; for example, bone 
fracture assessment, mammography, and angiographic procedures. 

e Multiple-view X-ray radiography is realized in computerized axial tomography (CAT) scans used 
to provide 3D images of body tissues. 

e Neutron and X-ray imaging is used in industry to quantify manufactured part assemblies or defects 
which cannot be visually inspected. 


Definition A.1.1 


Transmission Radiography is the process of measuring and recording changes in a high-energy 
particle beam (X-rays, protons, neutrons, etc.) resulting from passage through an object of interest. 


Definition A.1.2 


Tomography is the process of inferring properties of an unknown object by interpreting radiographs 
of the object. 


X-rays, just like visible light, are photons or electromagnetic radiation, but at much higher energies 
and outside of the range of our vision. Because of the wavelength of typical X-rays (on the order 
of a nanometer), they readily interact with objects of similar size such as individual molecules or 
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E(x, y,z) 


x-ray Incident Region of Interest Transmitted Detector 
Source x-ray beam x-ray beam 


Fig.A.1 Typical radiographic experiment. 


atoms. This property makes them particularly useful in transmission imaging. Figure A.1 is a cartoon 
of a typical X-ray radiographic experiment or procedure. An X-ray beam is produced with known 
energy and geometric characteristics. The beam is aimed at a region of interest. The photons interact 
with matter in the region of interest, changing the intensity, energy and geometry of the beam. A 
detector measures the pattern (and possibly the distribution) of incident energy. The detection data, 
when compared to the incident beam characteristics, contains the known signature of the region of 
interest. We consider the mathematics and some of the physics involved in each step with the goal of 
modeling a radiographic transformation appropriate for mixed soft and hard tissue axial tomography 
of the human body. 


A.2. The Incident X-ray Beam 


We begin with an X-ray beam in which the X-ray photons all travel parallel to each other in the beam 
direction, which we take to be the positive x-direction. Additionally we assume that the beam is of short 
time duration, the photons being clustered in a short pulse instead of being continuously produced. 
A beam with these geometric characteristics is usually not directly achievable through typical X-ray 
sources (see supplementary material in Section A.8 for some discussion on X-ray sources). However, 
such a beam can be approximated readily through so-called collimation techniques which physically 
limit the incident X-rays to a subset that compose a planar (neither convergent nor divergent) beam. 

While not entirely necessary for the present formulation, we consider a monochromatic X-ray 
source. This means that every X-ray photon produced by the source has exactly the same “color.” The 
term “monochromatic” comes from the visible light analog in which, for example, a laser pointer may 
produce photons of only one color, red. The energy of a single photon is proportional to its frequency 
v, or inversely proportional to its wavelength \. The frequency, or wavelength, determine the color, in 
exact analogy with visible light. In particular, the energy of a single photon is hy where the constant 
of proportionality h is known as Planck’s constant. 

The intensity (or brightness), E(x, y, z), of a beam is proportional to the photon density. The 
intensity of the beam just as it enters the region of interest at x = 0 is assumed to be the same as the 
intensity at the source. We write both as E(0, y, z). It is assumed that this quantity is well known or 
independently measureable. 
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dE(x) = E(a# + dx) — E(x) = p(x) E(x)dx 


Fig.A.2 X-ray beam attenuation computation for a beam path of fixed y and z. 


A.3. X-Ray Beam Attenuation 


As the beam traverses the region of interest, from x = 0 to x = D, (see Figure A.2), the intensity 
changes as a result of interactions with matter. In particular, a transmitted (resultant) intensity 
E(Dx, y, Z) exits the far side of the region of interest. We will see that under reasonable assumptions 
this transmitted intensity is also planar but is reduced in magnitude. This process of intensity reduction 
is called attenuation. It is our goal in this section to model the attenuation process. 

X-ray attenuation is a complex process that is a combination of several physical mechanisms (see 
Section A.9) describing both scattering and absorption of X-rays. We will consider only Compton 
scattering which is the dominant process for typical medical X-ray radiography. In this case, attenuation 
is almost entirely a function of photon energy and material mass density. As we are considering 
monochromatic (monoenergetic) photons, attenuation is modeled as a function of mass density only. 

Consider the beam of initial intensity E(0, y, z) passing through the region of interest, at fixed y 
and z. The relative intensity change in an infinitesimal distance from x to x + dx is proportional to the 
mass density p(x, y, z) and is given by 


dE(x,y,z) = E(x+dx, y,z) — E(x, y,z) = —pp(%, y, ZE(, y, z)dx, 


where j4 is a factor that is nearly constant for many materials of interest. We also assume that any 
scattered photons exit the region of interest without further interaction. This is the so-called single- 
scatter approximation which dictates that the intensity remains planar for all x. 

Integrating over the path from x = 0 where the initial beam intensity is E(0, y, z) tox = D, where 
the beam intensity is E(D,, y, z) yields 


dE(x, y,z) 
= - x,y, z)dx 
ee pp(x, Y, 2) 


E(Dx,9,2) dE(x, yy Z) Dx 
oe eR pH p(x, y, z)dx 
EO,y.2 £@,Y,2) 0 
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This expression shows us how the initial intensity is reduced, because photons have been scattered 
out of the beam. The relative reduction depends on the density (or mass) distribution in the region of 
interest. 


A.4_ Radiographic Energy Detection 


The transmitted intensity E(D,., y, z) continues to travel on to a detector (e.g., film) which records 
the total detected energy in each of m detector bins. The detected energy in any bin is the intensity 
integrated over the bin cross-sectional area. Let px be the number of X-ray photons collected at detector 
bin k. px is then the collected intensity integrated over the bin area and divided by the photon energy. 


1 Dx 
Pe= > i E(O, y, Z) (eH Hat \ dda: 
hv J Jin kb) 


Let the bin cross-sectional area, 7, be small enough so that both the contributions of the density and 
intensity to the bin area integration are approximately a function of x only. Then 


P= CEO, Yas Zk) ob Se P(X Yk Zk )AX 

hv ; 
where yx, and zx locate the center of bin k. Let a be the number of X-ray photons initially aimed at 
bin k, pe = 0E(0, x, y)/hv. Due to attenuation, py < pe for each bin. 


Dy 
= * AX, Vk ZK AX 
Pk = pe H fo * PCY ZK)AX 


Equivalently, we can write (multiply the exponent argument by a/c): 


Pk = pres Io" F(X VeoZk)AX 

The remaining integral is the total mass in the region of interest that the X-ray beam passes through to 

get to bin k. We will call this mass s;. Now we have 
Pk = pre «/@, 

where a = o/,. This expression tells us that the number of photons in the part of the beam directed 


at bin k is reduced by a factor that is exponential in the total mass encountered by the photons. 
Finally, we note that the detector bins correspond precisely to pixels in a radiographic image. 
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Fig.A.3 Object space and radiograph space discretization. 


A.5 The Radiographic Transformation Operator 


We consider a region of interest subdivided into N cubic voxels (3-dimensional pixels). Let x; be the 
mass in object voxel j and T;; the fraction of voxel j in beam path k (see Figure A.3). Then the mass 
along beam path k is 


N 
Sk = > Tj Xj; 
j=! 


and the expected photon count at radiograph pixel k, px, is given by 


1 N 
pea pe BEN), 


or equivalently, 
N 
b= (-at ms) — > Tj Xj. 
Px j=l 


The new quantities b; represent a variable change that allows us to formulate the matrix expression 
for the radiographic transformation 
b=Tx. 


This expression tells us that given a voxelized object mass distribution image x € R, the expected 
radiographic data (mass projection) is image b € R”, with the two connected through radiographic 
transformation T € Mm (IR). The mass projection b and actual photon counts p and p” are related 
as given above. It is important to note that b;, is defined only for p, > 0. Thus, this formulation is only 
valid for radiographic scenarios in which every radiograph detector pixel records at least one hit. This 
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Fig. A.4 Example axial radiography scenario with six equally spaced views. Horizontal slices of the object project to 
horizontal rows in the radiographs. 


is always the case for medical applications which require high contrast and high signal-to-noise ratio 
data. 


A.6 Multiple Views and Axial Tomography 


Thus far, we have a model that can be used to compute a single radiograph of the region of interest. 
In many applications, it is beneficial to obtain several or many different views of this region. In some 
industrial applications, the region of interest can be rotated within the radiographic apparatus, with 
radiographs obtained at various rotation angles. In medical applications the radiographic apparatus is 
rotated about the region of interest (including the subject!). In this latter case, the voxelization remains 
fixed and the coordinate system rotates. For each of a view angles the new m pixel locations require 
calculation of new mass projections 7;;. The full multiple-view operator contains mass projections 
onto all M = a - m pixel locations. Thus, for multiple-view radiography: x is still a vector in RY, but 
b is a vector in R™ and T is a matrix operator in Myx v(R). 

Finally, we make the distinction between the general scenario and axial tomography (CAT scans). 
In principle, we could obtain radiographs of the region of interest from any direction (above, below, 
left, right, front, back, etc.). However, in axial tomography the physical limitations of the apparatus 
and subject placement dictate that views from some directions are not practical. The simplest scenario 
is to obtain multiple views by rotating the apparatus about a fixed direction perpendicular to the 
beam direction. This is why CAT machines have a donut or tube shaped appearance within which the 
apparatus is rotated. The central table allows the subject to rest along the rotation axis and the beam 
can pass through the subject along different trajectories. 

This axial setup also simplifies the projection operator. If we consider the £’" slice of the region of 
interest, described by an n x n arrray of N voxels, the mass projections of this slice will only occur 
in the £’” row of pixels in each radiographic view see Figure A.4. As a result, 3D reconstructions 
can be obtained by a series of independent 2D reconstructed slices. For example, the brown slice of 
the spherical object (represented in RY ) is related to the collection of brown rows of the radiographs 
(represented in R™) through the projection operator T € Myx. (R). The black slice and black rows 
are related through the same projection operator. 


A.7_ Model Summary 


The list below gathers the various mathematical quantities of interest. 


e N is the number of object voxels. 
e M is the number of radiograph pixels. 
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x € RN is the material mass in each object voxel. 

b € R™ is the mass projection onto each radiograph pixel. 

p € R™ is the photon count recorded at each radiograph pixel. 

T € Myxm(R) is voxel volume projection operator. 7;; is the fractional volume of voxel j which 
projects orthogonally onto pixel i. 

py is the incident photon count per radiograph pixel. 

e b=-alnS. 


e b =T*x is the (mass projection) radiographic transformation. 


The description of images (objects and radiographs) as vectors in RY and R™ is computationally 
useful and more familiar than a vector spaces of images. One should keep in mind that this is a particular 
representation for images which is useful as a tool but is not geometrically descriptive. The price we 
pay for this convenience is that we no longer have the geometry of the radiographic setup (pixelization 
and voxelization) encoded in the representation. 

A vector in R?, say (1, 2, 5), is a point in a 3-dimensional space with coordinates described relative 
to three orthogonal axes. We can actually locate this point and plot it. An image represented in R*, say 
(1, 2, 5), is not a point in this space. Without further information about the vector space of which it is 
a member, we cannot draw this image. The use of R? allows us to perform scalar multiplication and 
vector addition on images because these operations are equivalently defined on R?. 


A.8 Model Assumptions 


The radiography model we have constructed is based on a number of approximations and assumptions 
which we list here. This list is not comprehensive, but it does gather the most important concepts. 


e Monochromaticity. Laboratory sources generate X-rays with a wide range of energies as 
a continuous spectrum. We say that such X-ray beams are polychromatic. One method of 
approximating a monochromatic beam is to precondition the beam by having the X-rays pass 
through a uniform material prior to reaching the region of interest. This process preferentially 
attenuates the lower energy photons, leaving only the highest energy photons. This process is known 
as beam-hardening. The result is a polychromatic beam with a narrower range of energies. We can 
consider the beam to be approximately monochromatic, especially if the attenuation coefficient(s) 
of the material, 1, is not a strong function of photon energy. 

e Geometric Beam Characteristics. Laboratory sources do not naturally generate planar X-ray 
beams. It is more characteristic to have an approximate point source with an intensity pattern that 
is strongly directionally dependent. Approximate planar beams with relatively uniform intensity 
E(0, y, x) can be achieved by selective beam shielding and separation of source and region of 
interest. In practice, itis common to use the known point source or line source characteristics instead 
of assuming a planar beam. The model described here is unchanged except for the computation of 
T itself. 

e Secondary Radiation. Our model uses a single-scatter approximation in which if a photon 
undergoes a Compton scatter, it is removed from the analysis. In fact, X-rays can experience 
multiple scatter events as they traverse the region of interest or other incidental matter (such as the 
supporting machinery). The problematic photons are those that scatter one or more times and reach 
the detector. This important secondary effect is often approximated by more advanced models. 

e Energy-Dependent Attenuation. The attenuation coefficient 4, which we have taken to be 
constant, is not only somewhat material dependent but is also beam energy dependent. If the beam 
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is truly monochromatic this is not a problem. However, for a polychromatic beam the transmitted 
total energy will depend on the distribution of mass along a path, not just the total mass. 

Other Attenuation Mechanisms. We have included only Compton scattering in the model. Four 
other mechanisms (outlined in the supplementary material) contribute to the attenuation. While 
Compton scattering is the dominant contributor, photoelectric scattering will have some effect. 
It becomes important at the lower energies of interest and for materials of relatively high atomic 
number—such as calcium which is concentrated in bone. The major effect of ignoring photoelectric 
scattering is quantitative mass uncertainty. 

There are a number of Detector-Related Effects which affect radiograph accuracy. Energy 
detection efficiency can be a function of beam intensity, photon energy and even pixel location. 
Detectors are subject to point-spread effects in which even an infinitesimally narrow beam results 
ina finitely narrow detection spot. These types of effects are usually well-understood, documented 
and can be corrected for in the data. Detection is also prone to noise from secondary radiation, 
background radiation, or simply manufacturing variances. 


A.9 Additional Resources 


There is a variety of online source material that expands on any of the material presented here. Here, 
are a few starting points. 


https: //www.nde-ed. org/EducationResources/CommunityCollege/Radiography/c_rad_index.htm 


http: //web.stanford.edu/group/glam/xlab/MatSci1l62_172/LectureNotes/01_Properties%20&%20Safety.pdf 


http: //radiologymasterclass.co.uk/tutorials/physics/x-ray_physics_production.html 


Several physical processes contribute to absorption and scattering of individual photons as they 
pass through matter. Collectively, these processes alter the geometry and intensity of a beam of such 
photons. What follows is a brief description of each. 


The Photoelectric Effect is the absorption of an X-ray photon by an atom accompanied by the 
ejection of an outer shell electron. This ionized atom then re-absorbs an electron and emits an 
X-ray of energy characteristic of the atom. The daughter X-ray is a low-energy photon which is 
quickly re-absorbed and is effectively removed from the X-ray beam. Photoelectric absorption is 
the dominant process for photon energies below about 100keV and when interacting with materials 
of high atomic number. 

Rayleigh Scattering is the process of a photon interacting with an atom without energy loss. The 
process is similar to the collision of two billiard balls. Rayleigh scattering is never the dominant 
mechanism, at any energy. 

Compton Scattering occurs when an X-ray photon interacts with an electron imparting some 
energy to the electron. Both electron and photon are emitted and the photon undergoes a directional 
change or scatter. Compton Scattering is the dominant process for soft tissue at photon energies 
between about 100keV through about 8MeV. 

Pair Production is the process in which a photon is absorbed producing a positron-electron pair. 
The positron quickly decays into two 510keV X-ray photons. Pair Production is only significant 
for photon energies of several MeV or more. 

Photodisintegration can occur for photons of very high energy. Photodisintegration is the process 
of absorption of the photon by an atomic nucleus and the subsequent ejection of a nuclear particle. 
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Exercises 


1. Given 7; = 0.42, what does this value mean? 

2. What is accomplished in the matrix multiply when multiplying an object vector by the kth row of 
T? 

3. Explain how you change the radiographic operator when changing from one view to many. 

4. Why do we use pixel values by where by is defined as 


by =-—aln (%) 
PO 


instead of the expected photon count p;? 


Appendix B 
The Diffusion Equation 


The 1-dimensional diffusion equation, or heat equation, is the partial differential equation 


of a of 
B= So (nen), 


where « is the position- and time-dependent diffusion coefficient. The diffusing quantity, f, also 
depends of both position and time. For example, f(x, t) could describe the distribution of heat on a 
rod, or the concentration of a contaminant in a tube of liquid. For uniform static materials, we can 
assume & = | and use the homogeneous diffusion equation 


af _OfF 
Ot Ox?" 


Solutions to this simplified diffusion equation, f(x, tf) describe how the quantity f evolves in time 
for all positions x. The rate at which f changes in time is equal to the second derivative of f with 
respect to position x. When the second derivative is positive, the function f(x) is concave up, and f 
is increasing in this region where f is locally relatively small. Similarly, when the second derivative 
is negative, the function f(x) is concave down, and f is decreasing in this region where f is locally 
relatively large. We also notice that the degree of concavity dictates how fast f(x) is changing. Sharp 
peaks or valleys in f(x), change faster than gentle peaks or valleys. Notice also that where f(x) has 
nearly constant slope, the concavity is nearly zero and we expect f(x) to be constant in time. Finally, 


oe? 
the sign of the rate of change guarantees that lim a. = 0. 
t>00 Ox2 


Boundary Conditions 


We consider the one-dimensional diffusion equation on the interval [a, b] with f(a, t) = f(b, t) = 0. 
This particular boundary condition is appropriate for our discussion and solution of the heat state 
evolution problem used throughout this text. 
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Discretization in Position 


We will discretize f (x, t) (in position) by sampling the function values at m regularly spaced locations 


in the interval [a, b]. For L = b — a, we define Ax = at i= baa . Then, the sampling locations are 
a+Ax,a+2Ax,...,a+mAx. We then define the discretized sampling as a vector u in R”. 


u = [u1, U2, ...,Um] =[f(at+ Ax), f(at+2Ax),..., fla+mAx)], 


where we have suppressed the time dependence for notational clarity. Notice that if uj; = f(x) for 
some x € [a,b] thenuj41 = f(x + Ax) and uj_1 = f(x — Ax). 
Now, because derivatives are linear, intuitively we believe that it should be possible to get a good 
discrete approximation for 0 f/Ox and 0? f/0x?. We will use the definition of derivative 
_ S@+h) — Ff) 
m : 


Of Lg 
De E20 h 


Notice that in a discrete setting, it is not possible for h — 0. The smallest h can be is the sampling 
distance Ax. Let’s use this to make the matrix representation for the second derivative operator, which 
we call D. That is, Dyu approximates 0 f/Ox?. 


axe Ox Ox ax h 


~2 (2 + Ax) - 109) 


Of  O0F 0 (im fora fo) 
h>0 


Ax Ax 

i tin | ADL eee] 
~ Ax h0 h 

oo Lf + Ax) - 270) + FO - ADI, 


Notice that we have used both a forward and backward difference definition of the derivative in order to 
make our approximation symmetric. This helps us keep the later linear algebra manageable. Applying 
this result to our discrete state u: 


uj; 1 
a,? = Tage itl 2uj +uj—1). (B.1) 


The matrix representation for this second derivative operator (in the case of m = 6) is 


21000 0 
i oe ee 
1, 1 |o 1-21 0 0 
(Ax 7 (Ax? | 0 0 1-21 0 
000 1-21 
00:0 © 1 2 


For any discrete vector u € R 
derivative. 


ia ( uy Du € R” is the discrete approximation to the second spatial 
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Discretization in time 


The quantity f also changes in time. Again, using the definition of derivative, we have 


Of _ f(t+h)— f(t) 
——_- = Amn ~OA———RR—__.. 
Ot h—o0 h 


In this case, we consider a finite time step At and approximate the derivative as 


Of  FE+AN—-fO 
Ot At 


Then, using our discrete representation, 
Ou; s uj(t + At) —uj;(t) 
Ot At ; 


Discrete Diffusion Equation 


Now, combining our discretized approximations in both time and position, we have the discrete 
diffusion equation 


t+ At) —u(t 1 
u(t + At) ~u@) _ Dou(t). 
At (Ax) 
Or, more simply, 
At 
=({/J+ ee D (t) 
(Ax 7)" 
Finally, we define m x m matrix Eas E = J + Tan? D2. Then we have the discrete diffusion evolution 


matrix equation 
u(t + At) = Eu(t). 


The matrix E is given by 


1-25 6 
6 1-26 6 0 
6 ; : 
E= , (B.2) 
} 
0 6 1-26 6 
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where 6 = a. E is a symmetric matrix with nonzero entries on the main diagonal and on both 
adjacent diagonals. All other entries in E are zero. 

Time Evolution 

Now, we consider the evolution over more than one time step. 


u(t + At) = Eu(t) 
u(t +2At) = Eu(t+ At) = EEu(t) = E7u(t). 


This means k time steps later, 
u(t +kAt) = Eku(t). 


Appendix C 
Proof Techniques 


In the course of reading and writing your own careful mathematical justifications (proofs), you will 
gain a deeper understanding of linear algebra. In addition, you will develop your skills in reading and 
communicating mathematics. To give the reader examples and practice, we present several proofs and 
request several proofs in exercises as well. In addition, we cover, in this appendix, techniques and 
etiquette commonly seen in proofs. First, we give rules of logic to help understand what follows. Then 
we will outline the various standard proof techniques that will be helpful in studying linear algebra. 
We wrap up this appendix with rules of etiquette for proof writing. It is important to note that this 
is not a complete discussion about proof writing. We encourage readers to consult other resources 
on mathematical communication and proof writing to gain more perspectives and a bigger picture of 
mathematical discourse. 


C.1_ Logic 
In this section, we layout standard rules of mathematical logic and the interpretation of words such as 


‘or’ and ‘and.’ We begin with the mathematical definition of a ‘statement,’ which is different from the 
colloquial definition as a declaration in speech or writing. 


Definition C.1.1 


A statement is a sentence with a truth value. 


Before giving examples, we need to understand ‘truth value.’ 
A statement has a truth value if it is either true or it is false. There are many statements for which 
we do not know the truth value, but it is clear that there is a truth value. For example, 


“There is life in another galaxy.” 


is a statement. We do not have the technology to check all galaxies for life and we haven’t found life 
yet. (Who knows, maybe by the time this book is published, we do know the truth about this statement.) 
But, we do know that it must be either true or false. 


Example C.1.2 Here, are a few more statements. 


1. 5 is an odd number. 

2. 6 is the smallest integer. 

3. Ifx = Othen3x+1=1. 
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4. Whenever 3x2 + 2 = 5 is true, x = 6 is also true. 
5. Everyone who reads this book is taking a linear algebra course. 
6. Some cats are pets. 


As you can see, not all of the statements above are true. Indeed, statements |, 3, and 6 are true, but 
statements 2, 4, and 5 are false. 


Example C.1.3. A statement cannot be a phrase, command, exclamation, an incomplete idea, nor an 
opinion. For example: 


A. Blue 

B. Cats are the best pets. 
C. Wow! 

D. Prove that x = 2. 


(A.) in Example C.1.3 is not even a sentence. The sentence C.1.3(B.) is not a statement. You may 
want to argue that it is true, but it is only an opinion. Sentence (C.) is not something about which we 
would discuss the truth. Finally, though x = 2 may or may not be true, (D.) has no truth value as it is 
commanding one to prove a statement. 

We can combine statements to make new statements using the conjunctions and, or, if...then, and if 
and only if. There is a precise way to determine the truth of the resulting statements, using the following 
rules. 

When ‘and’ is used to connect two statements, the new statement is true only when both original 
statements are true. Hence, the statement “5 is an odd number and 6 is the smallest integer” is false 
because “6 is the smallest integer” is false. 

When ‘or’ is used to connect two statements, the new statement is false only when both are original 
statements are false. In other words, the new statement is true when at least one of the component 
statements is true. The statement “5 is an odd number or 6 is the smallest integer” is true because “5 
is an odd number” is true. 

When using “Tf...then...” to create anew statement from two others, we call the statement that directly 
follows “If” is called the hypothesis. The statement that follows “then” is called the conclusion. If-then 
statements are false whenever the hypothesis is true and the conclusion is false. They are true in every 
other situation. The statement “If unicorns exist, then I can ride them to the moon." is true because there 
the hypothesis, “Unicorns exist” is false. “If ...then...” statements are called implications. Another way 
to phrase an implication is as a “... implies ...” For example, the statement 


“The number x is divisible by 4 implies that x is even.” 

is equivalent (i.e., they have the same truth value) to 
“Tf x is divisible by 4 then x is even.” 

Switching the hypothesis and conclusion results in the statement 
“Tf x is even then x is divisible by 4.” 


We know that x is even does not imply that x is divisible by 4. Therefore, the resulting statement is 
false. 

When using “if and only if” to combine two statements, we are writing an equivalence between two 
statements. That is, an if-and-only-if statement is true when both statements are true or when both are 
false. The statement “The integer x is even if and only if 2 divides x.” is a true statement. We know 
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this because either x is even or x is odd. If x is even, then “x is even” and “2 divides x” are both true. 
If x is odd, then x is even” and “2 divides x” are both false. 

In the truth table below, we have indicated the truth for the various cases that occur when connecting 
two statements. Given two statements, P and Q, the following notation will allow us to compactly 
summarize a variety of situations in TableC.1. 


Notation Meaning 
T true 
F false 
PAQ P and Q, 
PVQ IP iQ, 
P => Q P implies Q or If P then Q, 
ee 12 not P 


The truth values (7 or F) for various logic cases are shown in Table C.1. Each row of the table shows 
the logical truth value for conjunctions and implications formed from P and Q based on the given 
truth values in the first two columns. Notice that the columns of Table C.1 corresponding to PA ~ Q 
and P => Q have exactly opposite truth values. This will be very useful later when we discuss proof 
by contradiction. Notice, also, that P => Q and ~ Q >~ P have the same truth value. This will be 
very important when discussing the contrapositive. 


P|Q|~ P|PA O|Pv O|P > O|PA~ O|~ O =~ P|P —> QO 
TIT| F/ T T 3 F T T 
TIF| F F T F T F F 
FIT| T F T J 3 F T F 
F/F| T F F aT F ‘i T 


Table C.1 A truth table for conjunctions and implications formed using statements P and Q. 


C.2. Proof structure 


In this section, we describe what we consider a good general structure for a proof. This should only 
serve as a starting point for your proof writing. You can (and should!) add your own voice to your 
proofs without losing the necessary components for a good proof. 

Here is the general structure with which you should write your proofs. 


Statement 

This is the statement that you will be proving. 

Proof. <—Every proof should begin with this. In cases of particular types of proofs, you may 
indicate the technique you will be using. More on that later. 

Hypotheses: Here, you list all the relevant assumptions. You use words like Assume that... or 
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Suppose... or Let... 
Plan: For more complicated proofs, you give the reader an idea about how the proof will work. 
We will show that... 

Proof Body: Here is where you begin listing statements that link to definitions, theorems, algebra, 
arithmetic,... these will all lead to the result being proved. These statements should create a path 
that can be traversed to get from the hypotheses to the conclusion. It is important to know your 
audience. Leaving large gaps between statements can make it difficult or impossible for your 
reader to traverse the argument. 

Conclusion: Here, you state what you just proved. 

Every proof should end with some sort of symbol indicating your proof is done. One very common 
example is the symbol here. —> 


Now, we move to various proof techniques. In these examples, we use only one proof technique. In 
writing proofs, you may use multiple techniques in a single proof. 


C.3 Direct Proof 


In this section, we will show some examples of writing direct proofs. Whenever a direct proof is 
achievable it is more proper to give a direct proof of a statement than any other type of proof. This is 
because direct proofs tend to be more straightforward and easier to follow. Of course, this is not always 
the case and we discuss other proof techniques in upcoming sections. 

Proofs use theorems and definitions to make a clear argument about the truth of another statement. 
To illustrate this, we will use the following definition. 


Definition C.3.1 


A number n € Z is even if there exists k € Z so that n = 2k. 


In the following examples, we will add comments in blue to emphasize important features. 


Example C.3.2 If n € Z is even, then n? is even. 


Proof. Suppose n € Z is even. We always start by telling the reader what we 
are assuming. 

We will find k € Z so that n? = 2k. We follow up with our goal or our plan. 

We know that n = 2m for some m € Z. Write what our assumptions mean. We can 
work with this information. 

Notice n* = 4m? = 2(2m?). We use algebra to reach our goal.! 


Since 2m? € Z, we can use the definition of We always end a proof stating the result. 
even to say that n* iseven. O 
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Example C.3.3 If and m are both even integers, then n + m is an even integer. 


Proof. Suppose n and m are Write the assumptions. 
even integers. 
We will show that there is Indicate the plan. 


ak € Zso thatn +m = 2k. 

We know that there are integers x | Write what the assumptions imply. 

and y so that n = 2x and m = 2y. 

Then n +m = 2x + 2y = 2(x + y). Use Algebra to write a statement 
implied by the last statement. 

Since x + y € Z,n+miseven. O The conclusion follows from the 
previous statement. 


All the assumptions in the above proofs came from the hypothesis of the statement. And, the proof 
ended with the conclusion of the statement. In the next example, we will refrain from adding the blue 
comments. Try to find the same elements in this next proof. 


Example C.3.4 Suppose x, y € R. If x and y are positive then x + y > 2,/xy. 


Proof. Assume x, y € R are positive. We will show that x + y — 2,/xy > 0. Notice that 
x —2/xy + y =(Vx)? — W/V + (V9)? 
=(V/x — J) (Vx — VY) 
=(/x — Jy) > 0. 


Thus, x + y > 2,/xy. 


In linear algebra, we prove many things about sets. Here, we show some examples of proofs about 
sets. In the beginning of this text, we seek to know whether certain objects lie in a given set. Let’s see 
how such a proof might look. 


Example C.3.5 Let A = 1¢ ) [a,d.edeR].Iuve Athen +v EA. 


Proof. Suppose A is the set given above and suppose that u,v € A. We will show that there 
ap 


are real numbers a, 3,7, and 6 so that u-+u= 5 


) We know that there are real numbers 


aj, bj, ci, d1, a2, b2, cz, and dz so that 
u= ged and v = az bo 
~ Vey dy ~ \eo do)’ 


Notice that 


a, +a2 b} + b2 
ci tc2 dj +d2) 
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Since a, + a2, bj + bo, cy + 2, and d; + dp» are all real numbers, we know that u + v € A. 


Let us consider one more proof of this form. 


Example C.3.6 Suppose P = {ax* + bx +c | a,b,c € Randa +b — 2c = 0}. Ifv € P thenav € 
P forallaeR. 


Proof. Suppose P is the set given above. Leta € Rand v € P. We will show that av = ax? + bx +-¢ 
for some a,b,c € R satisfying a + b — 2c = 0. We know that there are real numbers r,s, and t so 
that r +s — 2t = O and v = rx? +. sx +t. Notice that 

av = arx”? + asx + at. 
Notice also that ar, as, and at are real numbers satisfying 


ar+as —2at=a(r+s—2t)=a-0=0. 


Thus, av € P. Since a was an arbitrarily chosen real number, we conclude that av € P foralla € R. 


To compare sets, we need the following definition. 


Definition C.3.7 


Let A and B be sets. We say that B is a subset of A and write B C A if x € B implies x € A. 
We say that A= BifA C Band BCA. 


We will use the definition to show the following statement. 
Example C.3.8 Let 
A= {2x +3y |x, y € Z} and B = {6n + 12m | n,m € N}. 
Then B C A. 


Proof. Suppose A and B are the sets given above and suppose b € B. We will show that there are 
integers x, y so that b = 2x + 3y. We know that there are natural numbers n, m so that 


b=6n+ 12m 
= 2(3n) + 3(4m). 


Now, since 3n and 4m are integers, we see that b € A. Thus, B C A. 


In some instances, we are faced with proving a statement that is difficult to use a direct proof. An 
example is one that is of the form ~ P => ~ Q. An example follows. 


Example C.3.9 Let x € R. Suppose x7 + 2x —3 40, thenx # 1 andx 4 —3. 


Proof. Let x € R. Suppose x? + 2x —3 #0. We will show that x € {a € R|a #1 anda ¥ —3}. 
Notice that 
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x? + 2x —3 =(x — I(x +3). 


We see that neither x — | nor x + 3 can be zero. So then x A 1 and x A —3. 


The above proof feels a bit forced at the end. We see that our understanding of quadratic polynomials 
and lines helps us to see the ending statements must be true. But, it feels less like a proof and more 
like a list of statements we just know. In the next section we will prove this statement again, but with 
a smoother proof. 


C.4 Contrapositive 


Recall, from Table C.1, that the statement P => Q is equivalent to ~ Q => ~ P. In this section, 
we discuss when it is more appropriate to prove~ Q = > ~ P asameans to proving P => Q.We 
begin by defining the contrapositive of a statement. 


Definition C.4.1 
Given two statements P and Q. The contrapositive of the statement 
If P is true then Q is true.(P => Q.) 


is the statement 


If Q is false then P is false. (~ Q => ~ P.) 


Now, we return to Example C.3.9. 


Example C.4.2 Let x € R. Suppose x7 + 2x —3 40, thenx 4 1 andx 4 —3. 


Proof. Let x € R. We write our assumptions. 
Instead, we will prove the contrapositive. Here, we make clear the 

proof style we will use. 
That is, we will show 


If x = 1 orx = —3 then x? +2x —3=0. This is the contrapositive. 
We will use cases on x to prove this statement. Whenever there is an “or” 
in the hypothesis 


We consider cases assuming 
each of the smaller statements. 
First, assume x = 1. Then x — 1 = O and 
x? —Ix —-3 =(«-1)(x +3) =0-2=0. 
Now, assume x = —3. Then x + 3 = 0 and 
x* —2x —3 =(x-—1)@4+3) =—4-0=0. 
In either case, we see that x2 + 2x —3=0. O We finish with the result. 


Notice that proving the contrapositive flowed much more smoothly than trying a direct proof. We 
see that this statement sets itself up to be proved using the contrapositive because both the hypothesis 
and the conclusion have “not equal.” Negating these statements change these to “equals,” which is 
much easier to prove and much easier to use as an assumption. Let’s look at another example where 
the contrapositive is clearly easier to prove. 
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Example C.4.3 Let X, Y be sets. If x ¢ X then X¥ €XNY. 


Proof. Let X, Y be sets. We will, instead, show that the contrapositive is true. That is, we will prove 
Ifx e XN Y thenx € X. 


Suppose x € X M Y. By definition of the intersection, this means that x €¢ X andx € Y. Thus, ifx ¢ X 
thenx € XN Y. 


In some instances, a proof of the contrapositive is useful even though the hypothesis and conclusion 
statements do not explicitly have “not” in them. 


Example C.4.4 Suppose m,n € Z. If mn is even, then m or n is even. 


Proof. Letn,m € Z. We will show, instead, the contrapositive. That is, we will show 

If m and n are odd, then mn is odd. 
Now, suppose m and n are odd. Then there are integers k and ¢ so that m = 2k + 1 andn = 2¢+ 1. 
Notice that 


mn =(2k + 1)(2€+ 1) 
=Ak0 + 2k +2041 
=2(2kL +k +20) 41. 


Since 2k£ +k + £ € Z, mn is odd. Thus, if mn is even, m or n is even. 


All statements in this section have a contrapositive because they area all implications. In subsequent 
sections, we will introduce other methods of proof that are not only useful for proving some implications 
but can also be used to prove statements that are not implications. 


C.5 Proof by Contradiction 


In this section, we discuss the method of proof by contradiction. We know that P and ~ P have 
opposite truth values and we, also, see in Table C.1 that P =» Q and PA ~ Q have opposite truth 
values. That is, when one is true, the other is false. Thatis, ~ (P = > Q) and PA ~ Q have the same 
truth value. We will make use of this fact in this section. Here, we outline the method. 


Statement: P 


Proof. (By Contradiction) By way of contradiction, assume ~ P is true. (That is, P is false.) We 
will search for an absurd statement. 


Logical path through statements starting with a statement that follows from the assumption. 


A clearly false statement. 
— < (the symbol to recognize a contradiction.) 
Thus, the original assumption must be false and P must be true. 
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One of the most common first proofs by contradiction is seen in the following example. First, we 
define the set of rational numbers. 


Definition C.5.1 


Let x be a real number. We say that x is rational (or that x € Q) if there are integers a, b, with | 


being the only common divisor, so that x = £. 


For a number to be irrational, it must be real, but not rational. Notice as you read the statement 
that it is not an implication and cannot be proved using a contrapositive. In the following example, we 
begin by assuming that 2 is rational. This, after some algebra, leads to an absurd statement. 


Example C.5.2 The number ./2 is irrational. 


Proof. (By Contradiction) Indicate to the reader the type of proof. 
Assume the statement is false. 

That is, assume that /2 € Q. Write the negation of the statement. 
We will search for a contradiction. Give the reader your plan. 

Then there are a, b € Z so that Here, the proof feels the same. 


J/2 = 5 and 

a, b share only 1 as a common divisor. 

So-2) =a 

This means that a is even. 

Therefore, there is an integer k 

so that a = 2k. 

This means that 2b? = 4k?. 

Thus, b? = 2k2, meaning that b is even. 

So, 2 is acommon factor of a and b. 

This means that 2 = 1. > — Clearly, indicate the absurd statement. 
Therefore, our assumption that /2 State what is actually true 

is rational was false and so /2 must _ by recognizing the false assumption 
be irrational. O and writing the true statement. 


In the next example, we prove an implication using a proof by contradiction. Recall that~ (P => 
Q) and PA ~ @ have the same truth value. So, in such a case, we begin by assuming the implication 
is false by assuming P is true and Q is false. (Typically, we negate the statement Q and assume the 
negation is true.) 


Statement: P => Q 
Proof. (By Contradiction) By way of contradiction, assume the statement is false. That is, assume 
P and ~ Q are true. We will search for an absurd statement. 


Logical path through statements starting with a statement that follows from the assumption. 


A clearly false statement. 
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— < (the symbol to recognize a contradiction.) 
Thus, the original assumption must be false and P = > Q must be true. 


Example C.5.3 If X and Y are sets, then XN (Y UX)° = @. 
If A is a set, then A‘ contains all elements from some universal set, except those in A. 


Proof. (By Contradiction) By way of contradiction, suppose the statement is false. That is suppose 
X and ¥Y are sets and X N(Y U X)° 4 @. We will search for a contradiction. We know that there is 
an element x € XM (Y U X)°. This means that x ¢ X and x ¢ YU X. Since x ¢ Y UX, x can be an 
element of neither X nor Y. Thatis,x ¢ X.— < therefore our original assumption that X N (Y U X)° # 
His false. So XN (YU X)° = 9G. 


Let us now spend a little time making sense of the logic behind a proof by contradiction. We do this 
with the following illustration. 

Suppose we have a statement P for which we want to find a proof. Consider the following logical 
path of statements in a hypothetical proof by contradiction. 


Statement path: ~ P => R=> Q 


Now, suppose that Q is an obviously false statement. Since every statement that we put into a proof 
must be true (except for possibly our assumptions), we can follow the following logical progression 
backwards find the truth of P. We do this using the fact that given two statements S$; and Sz, if S2 is 
false and S; ==> Sp is true, then S; must also be false (see Table C.1 column 7). 


Statement Truth Value Reason 


Q False Q is the obviously absurd statement. 
kk ==> @ True Statement in the proof 
R False Table C.1, column 7 
1? => IX True Statement in the proof 
ee IP False Table C.1, column 7 
P True Table C.1, column 3 


C.6 Disproofs and Counterexamples 


Up to this point, we have proved statements. In this section, we talk about how to show that a statement 
is false. We have two choices to indicate how we know a statement is false: using a counter example or 
writing a disproof. It is not the case that both choices work for any false statement. Let’s look at some 
examples. 


Example C.6.1 Let x € R then x? — 5x +4> 0. 

Note: In this example, we see that the statement says that no matter which real number we choose for 
x, we will get a nonnegative result for x7 — 5x + 4. To show that the statement is false, we only need 
to choose one real number for x that makes x7 — 5x + 4 negative. 
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Counter Example. Let x = 2. Then x* — 5x +4=4-104+4=-2 <0. 
Consider the following example for which a counter example is not attainable. 


Example C.6.2 There exist prime numbers p and g for which p + g = 107. 


Disproof: Assume the statement is true. We will show that this leads to a contradiction. Let p and q 
be prime numbers so that p + gq = 107. Since 107 is odd, we know that p and qg cannot both be odd 
(nor can they both be 2). Thus, either p = 2 or g = 2. Without loss of generality, let us assume p = 2. 
Then g = 105 = 19- 6. ><. Therefore, there are no two primes whose sum is 107. 

Notice that we used the words “without loss of generality” in this disproof. We use this phrase to 
indicate that had we chosen q = 2, the proof would have not been different except to say the exact 
same things about p as we said about q. 

In linear algebra, many times, we want to prove or disprove whether a list of properties is satisfied. 
In the next example, we show that a set V is not a vector space (see Definition 2.3.2), using a disproof. 


Example C.6.3 Let V = {ax+b|a,b€ Randa+b= 1}. V isa vector space. 


Counter Example. We notice that V is not closed under addition. Indeed, Let u = 1x + 0 and v = 
Ox + 1, then u, v € V, but the sumu+v=x +1 ¢ V. Thus, V is not a vector space. 


C.7 The Principle of Mathematical Induction 


In this section, we discuss statements of the form P(n), where, for each n € N, we have a different 
statement. If we know that for any n > 1, P(n + 1) can be inferred from the statement P(n), then we 
can apply the principle of mathematical induction. The idea is to think of this like a domino affect. 
If you push the first domino, then the next will fall, causing the toppling of the entire design as in 
Figure C.1. 


Fig.C.1 Mathematical Induction is like toppling dominoes. 
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Theorem C.7.1 (The Principle of Mathematical Induction) 
Suppose P(n) is a statement defined for every n € N, then 


(1) P(1) is true and 
(2) Whenever P(k) is true for some k € N, then P(k + 1) is also true (P(k) => P(k+1)), 


then P(n) is true for alln € N. 


Next, we prove a statement using mathematical induction to see how this works. We will use a 
statement we use in Calculus. 


Example C.7.2 For anyn € N, 


ee 
ne 
j=l 


Proof. (By Induction) First, we define We indicate the type of proof at the start. 
the statement P(n) as follows This step is not necessary, but does 
ES j= May ease notation. 

(1) Notice that P (1) is the trivial Base Case: P(1) is true. 


statement 1 = 12, 


(2) Let k € N. Now, we will show that 

P(k) => P(k+1). 

Suppose EO) is true. That is, Here, we use a direct proof. 
: k(k+1 

suppose )-¥_ 1/= Mar) = d, 

We will show that ee FL j = ery 


This is called the “induction hypothesis.” 


is also true. 
Notice that 
Ey J slésp lak pBe rai In every proof by induction 
Using the Induction Hypothesis, we have you will employ the induction hypothesis. 
k+1 K(k) __ 2(k+1)+k(k+1) 
Benne me : 
Thus, P(k) => P(k+1). General case is true. 
Therefore, by the Principle of Mathematical Induction, We call upon the principle of mathematical 
i jJ= mnt) is true foralln. O induction to use the truth of the base case 
and the general case to the statement 
being proved. 


In reading through the above proof, you can see the base case is the same as hitting the first domino. 
Then, because the general case is true, the second domino is hit by the first. In turn, the third domino 
is hit by the second, and so on. We get the following sequence of statements: 
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P(i) True 
Pd) = PQ) True 
P(2) True 
PQ) => P(3) True 
P(3) True 


Here, we show another example. 


Example C.7.3 Letn € N, then (1+ x)” > 1+ nx forall x € R,x > —-1. 
Proof. (By Induction) Let n € N, x € R, and x > —1. Define P(7) to be the statement 
+x)" >1+nx. 

(1) Notice that P(1) is the trivial statement (1 + x) = (1 + x). Thus, P(1) is true. 
(2) Fix ke N. We will show that P(k) = > P(k+1). That is, if (1 +x)*>1+kx then 

(par = 0 eas, 

Suppose (1 + x)* > 1 + kx. Notice that 

d+x)*!=(4+x)*(1+x) andl1+x>0. 


Thus, by the induction hypothesis, 


(+x)! > (1 +kx)(1+x) 
= Ltkxet+x+kx? 
=1+4+(k+1)x +kx’. 


Since, kx? > 0, we know that (1 + x)At+! > 14+ (k + 1x. Thus, P(k) => P(k+ 1. 
By the principal of mathematical induction, we know that (1 + x)” > 1+ nx for alln EN. 


In both examples, P(1) was a trivial statement. This is not always the case. It should also be noted, 
that if we were to prove a statement that is only true for alln € {k € N | k > N} for some given number 
N, the only change to an induction proof is the base case. This is, in essence, the same as knocking 
over the 10th domino and then seeing the toppling effect happen from the 10th domino onward. 


C.8 Etiquette 


With some basics on proof-writing down, we now turn to the etiquette of proof-writing. It should be 
noted that the suggestions, rules, and guidelines in this section may be slightly different than those 
another mathematician will give. We try to give very basic guidelines here. You should know your 
audience when writing proofs, be somewhat flexible to adjust your proof writing to communicate clearly 
with those whom you are trying to communicate. To better understand the need for such etiquette, it is 
a good exercise to understand what makes mathematics easier for you to understand. The following is 
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a compilation of questions that you can ask yourself to begin understanding some etiquette you would 
like to have in place. It may be that we do not agree on etiquette, but it is always good for you to form 
an opinion also. With all that said, read through and answer the following questions to determine your 
preference for each. Unfortunately, we will not give you a character or personality profile at the end. 
This is, only, for your own use. 


1. Which of the following is most clear? 


(a) Vu,veV,v+u=ut+u. 

(b) For every pair of elements, u,v e@ V,u+tu=v+u. 

(c) The set V has the commutative property for addition. 

(d) The set V has the property so that if you take two elements and add them together in one order, 
you get the same thing as when you add them together in the other order. 


2. Which of the following best defines m as a multiple of k. 


(a) Letk € Z, then m is a multiple of k if m = kn. 

(b) m = kn. 

(c) Letk € Z, then m is a multiple of k if m = kn for some n € N. 

(d) m and k are integers and n is a natural number and m = kn means that m is a multiple of k. 
3. When reading mathematics, which of the following do you prefer? 

(a) Let ke Z. Suppose x =5k—6 and y=3k+2. Then x+ y=5k—-64+3k4+2= 

5k+ 3k -—6+2= 8 —4=4(2k — 1). 
(b) Let k € Z. Suppose x = 5k — 6 and y = 3k + 2. Then 


aty =5k-64+3k+2 
= 8k-—4 
= 4(2k — 1). 


(c) Letk € Z. Suppose x = 5k — 6 and y = 3k + 2. Then x + y = 4m where m e€ Z. 
(d) 


x+y =5k-6+4+3k+2 
x+y=8k-4 
x+y =4(2k — 1). 


4. Which of the following reads better? 


(a) Let x € Z be even and y € Zbe odd. Then x = 2k for some k € Zand y = 2m + 1 for some 
meéZ. Thenx + y = 2k+2m +1. Then x + y is odd. 

(b) Let x, y € Zand let x be even and y be odd. Then there are integers k and m so that x = 2k 
and y = 2m + 1. Thus, x + y = 2k + 2m + 1. Therefore, x + y is odd. 


5. Which of the following flows better? 


(a) Let x € R be positive and suppose x? —2x —3 =0. We will first factor to get (x — 3)(x+ 
1) = 0. Setting each factor to zero gives x — 3 = 0 and x + 1 = 0. In the first, we add 3 to 
both sides to get x = 3. Subtracting | on both sides in the second gives x = —1. Since x is 
positive, x = 3. 

(b) Let x € R be positive and suppose x” — 2x — 3 = 0. Notice that 
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0=4° +973 
= (x —3)(x +1). 
Thus, x = 3 or x = —1. Since x is positive, x = 3. 
(c) 
x? —2x-3=0 


(x —3)(x+1)=0 
x—-3=0, x+1=0 
x = 3 orx=—+ 


(d) Let x € R be positive and suppose x” — 2x — 3 = 0. We can solve the quadratic equation 
x*—2x-3=0 by factoring. This leads to (x — 3)(x + 1) = 0. To solve this equation, we 
set each factor to zero giving us two equations. These equations are x — 3 = Oandx +1=0. 
To solve x — 3 = 0, we add 3 to both sides. We get the solution x = 3. To solve the equation 
x + 1 =0, we subtract 1 from both sides. We get the solution x = —1 which is not positive. 
So our only solution is x = 3. 


6. Which of the following best shows that the sum rule for differentiation is true? 
(a) Proof, Let f(x) =x +1 and g(x) =x? —2x +1. Then f(x) =1 and g/(x) = 2x —2. 
So f’(x) + 9/(x) =14+2x-—2=2x-1. f(x) +9) =x +147 — 2x4+1=x? — 2x + 


2. So (f (x) + g(x))’ = 2x — 2. 
(b) Proof. Let f and g be differentiable functions. Then, by definition, both 


f@)= poe pa ease BC and g/(x) = lim garh) —9@) 


h h 
exist. 
h) — 
(F +9') = Jim (f+ D+ ? (f+ 9) 
oer fx +h) — f(x) + 9a +h) -— g@) 
= lim 
h>0 h 
_ ij fa +h)— fx)  ga&+h)—g) 
= lim + 
h>0 h h 
_ ij fxth)—f) .. ga+h)—gx) 
= bm AA Fs Lim 
h>0 h h>0 h 
ef ag 
(c) Proof. 


h) — 
(f + 9)'() = fim (f+ 9 + ; (fF + 9)(x) 


_ ij fa+h)— f@)+9@ +h) — 9) 
he h 
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(G7) = 7S). gers) 


f(x) + (x) = lim 


h h 
So 
i Ff@+h)—f@) , gxat+h)—g@) _,. fxt+h)—f)+g0+h) — 9) 
im + = lim . 
h->0 h h h->0 h 
Therefore, 


i Ff@+h)—f@) , gxat+h)—g@) _,. fath)—f)  gx+h) —g@) 
im + = lim + : 
h>0 h h h>0 h h 


7. Which of the following recognizes that the results are part of a collective effort by mathematicians? 


(a) Let x, y e N be even. We will show that x + y is also even. We know that there are integers 
m,n so that x = 2m and y = 2n. Thus, x + y = 2m + 2m = 2(m +n). Therefore, we have 
that x + y is also even. 

(b) Let x, y € N be even. I will show that x + y is also even. I know that there are integers m, n so 
that x = 2m and y = 2n. Thus, x + y = 2m + 2m = 2(m + n). Therefore, I have that x + y 
is also even. 

(c) Letx, y € N be even. They will show that x + y is also even. They know that there are integers 
m,n so that x = 2m and y = 2n. Thus, x + y = 2m + 2m = 2(m +n). Therefore, they have 
that x + y is also even. 

(d) Let x, y € N be even. It will be shown that x + y is also even. It is known that there are 
integers m,n so that x = 2m and y = 2n. Thus, x + y = 2m + 2m = 2(m +n). Therefore, 
it has been shown that x + y is also even. 


Now that you have completed the questionnaire, we will give some guidelines on the etiquette and 
rules used in proof writing. 


Etiquette Rule #1: Do not overly use words or symbols. Be concise, yet clear by mixing both. Only 
use terminology if it is clear the audience knows its meaning. 

If you notice, in question 1. above, (a) is full of symbols and less clear, at a glance, to parse, (c) 
uses terminology that is useful to anyone intimately familiar with such, (d) is overly verbose and one 
can easily get lost in all the language. We choose (b) because it simply gives the statement with some 
mix of words and symbols and terminology is at a minimum. Our second choice is (c). 


Etiquette Rule #2: Never begin a statement with symbols, but instead start with words. 
In question | notice, also that neither (b) nor (c) begin with symbols. This gives our next rule. 


Etiquette Rule #3: Always define your variables clearly being sure, when appropriate, to add 
quantifiers such as “there exist,’ “for some,’ and/or “for all.” This is less about etiquette as the 
mathematics is improper if not done this way. 

Notice in question 2. above, (a) does not define n so we cannot be sure that we are still talking about 
integers when discussing m and n. In (b), we see an equation and nothing more. In (d), we know the 
sets in which m,n, and k lie, but we are not sure which of these we are given and which we find. Our 
choice is (c) because it clearly fixes k and then m is defined as n multiplied by k and we are told it 
doesn’t matter which natural number n is as long as such an n exists. Really the idea is that we don’t 
have questions about which of k, m, and n are known and which are not nor do we question in which 
sets each are found. 
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Etiquette Rule #4: In longer computations, determine which steps are key and which are obvious. Be 
sure to leave in the key steps and leave out obvious steps that are not key to illustrating the goal. 

In question 3, response (a) has a long string of mathematical steps which flow from one line to the 
next. This is not visually appealing and can be difficult (depnding on length) to follow. 


Etiquette Rule #5: In longer computations, arrange your steps so that they are visually appealing, 
using vertical alignment to make the steps easier to follow. 

In question 3, response (c) leaves out all details, giving the reader work to do. Response (d) looks 
like a basic algebra student’s homework response. It is missing words and a clear path through the 
mathematics. Really, it is just a list of equations. We choose response (b). It is clear in (b) what we are 
assuming and how one gets from x + y to 4(2k — 1). Basic steps like using the commutative law of 
addition are left out. 


Etiquette Rule #6: Use transition words to make your proofs flow, but be sure to change words so that 
you are not transitioning with the same word every time. 

Question 4 has a clear winner (to us). We choose (b) because in (a), the repetition of “then” gets 
monotonous. It is nicer to read when the transition words change. 


Etiquette Rule #7: Do not treat your reader as if they are learning high school algebra. Rather, get to 
the point using clear and concise statements, focusing on the level of mathematics for which the proof 
is being written. In question 5, (a), (c), and (d) all have a flavor resembling the work of a basic algebra 
student. Although, (a) and (d) add some discussion and this takes these two above (c) which is just 
low level scratch work. Notice in (b), the assumptions are clear and the transitive property of equality 
clearly gets us from 0 to (x — 3)(x + 1) without an excess of words. We choose (b). 


Etiquette Rule #8: Examples, in general, do not make proofs. Question 6 shows two very common 
errors in writing proofs. In (a), an example is given to show a general statement. There are many 
statements that are false, but can be stated as a true statement for one example. For example, it is not 
true that an even number times an even number is zero, but 0-2 = 0. 


Etiquette Rule #9: When proving an equality or inequality, it is best to start with one side and pave a 
clear path to the other. Never write a string of statements that leads to an obvious statement. Continuing 
with question 6, we see in (c), the end result is a trivial statement and tells us no new information. 
Notice that the second to last line is also a bad stopping place as it tells us something obvious, but does 
not make the connection. Notice that in (b) (our choice), we obtain the clear result without any breaks 
from the beginning, (f + g)’(x) to the end f’(x) + g/(x). 


Etiquette Rule #10: In mathematical writing, the pronoun we use is “we.” 

We as mathematicians have been building new maths by standing on the shoulders of those before 
us. If it is not clear with the very pointed question 7 or the way we have been writing this chapter, we 
hope to make it clear now that mathematicians use the pronoun “we” to indicate that we do all this 
together, using past mathematical work and the minds of the reader to verify new mathematical ideas. 
Looking at response (b), you should notice how very egocentric it sounds. In response (c), we still ask 
the age old question, who are “they?” Response (d) leads one to wonder who will do and has done this 
work? 


Appendix D 
Fields 


In this appendix, we give a very brief overview of what we need to know about fields. In this book, 
many examples required that you understand the two fields R and Z. Here, we discuss other examples 
as well. A more complete discussion of fields is the subject of abstract algebra courses. It should be 
noted that we are only scratching the surface of fields to glean what we need for linear algebra. We 
begin with the definition of a field. 


Definition D.0.1 
A field F is a set with operations + and - with the following properties: 


e F is closed under + and -:: If a, b € F then so area + b anda- b. 

e The operations + and - are associative: If a,b,c € F then a+ (b+ c) = (a+b)+c and 
a-(b-c)=(a-b)-c. 

e There is an additive identity, that is, there is an element z € F so that z+ a =a foralla € F. 

e There is a multiplicative identity, that is, there is an element i € F so that i -a = a for every 
aeéF. 

e There are multiplicative inverses: Foralla €¢ F,a ¢ z, thereisanelementb € Fsothata-b =i. 

e The operations + and - are commutative: Ifa,b € F, thena+b=b+aanda-b=b-a. 

e Every element has an additive inverse: If a € F then there is ana € F so thata+a =z. 

e Multiplication distributes over addition: If a,b,c € F, thena:-(b+c)=a-b+a-c. 


For the rest of this chapter, we will look at several examples of fields. We begin with the two fields 
we use throughout the text. 


Example D.0.2_ In this example, we recognize that R, the set of real numbers, with the usual definition 
of + and - satisfies the field conditions. Many of these conditions are built into the operations. That is, 
we define both operations to satisfy the commutative, associative, and distributive properties. 

We know that | is the real number that acts as the multiplicative identity and 0 acts as the additive 
identity. We know that we use the negative real numbers to act as additive inverses for the positive 
numbers and vice versa (a € R has additive inverse —a) with 0 being its own additive inverse. We use 
reciprocals as multiplicative inverses (a € R anda 4 0 then 1/a € R is its multiplicative inverse). 
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Example D.0.3 Let p be a prime number and define the set Z, = {0, 1, 2,..., p — 1}. We define ©, 
(addition modulo p) and ©» (multiplication modulo p) as follows: 
Fora,b,ceZ 


a @pb = emeans “—"—* e Zand 0 <e <p 


c—ab 


a ©p b=c means ; 


€ Zand0 <c <p. 


Notice that the above says that a @p b is the remainder of (a + b) + p and a ©, bis the remainder of 
(ab) = p. 


Theorem D.0.4 
(Z2, B2, ©2) is a field. 


Proof. Because the associative, distributive, and commutative properties are inherited from Z, we need 
only show that Z2 is closed under @2 and ©2 and that there are additive and multiplicative identities 
and inverses in Z2. Closure properties for both ©, and ©» are evident by the fact that the remainder 
of division by p is always a positive integer less than p. 

Since 0 @2 | = 1 and 0 @2 0 = 0, we see that 0 is the additive identity for Z2. The multiplicative 
identity is 1 € Z2 since 1 ©2 0 = Oand1©21= 1. 

We can see that the additive inverse of every element of Z> is itself. So, Z2 has additive inverses. 
The multiplicative inverse of 0 is 0 and the multiplicative inverse of 1 is also itself. Thus, Z2 has 
multiplicative inverses. 


Theorem D.0.5 
(Z3, ®3, O3) is a field 


Proof. We see from the proof that Z2 is a field, that Z3 is also closed under @3 and ©3. Also, similar 
to the proof for Zz, we know that 0 and | are the additive and multiplicative identities, respectively. 
So, we need only show the existence of inverses. 

To show that additive inverses exist, we show a more general result. In Zp, a Gp (p — a) is the 
remainder of p + p. Thus, a ®p (p — a) = 0 and the additive inverse of a is p — a. Thus, Zp has 
additive inverses. 

To show that multiplicative inverses exist in Z3, we will find each one. Since 1 ©, 1 = 1 the 
multiplicative inverse of 1 is 1 in Z,. We can also see that (p — 1) Op (p — 1) is the remainder of 
(p* —2p + 1) = p. Thus, (p — 1) ©p (p — 1) = Land p — 1 is its own multiplicative inverse in Zp. 
Thus, 1, 2 € Z3 both have multiplicative inverses. 


We defined Z, for prime numbers, p. If p is not prime, we run into problems finding multiplicative 
inverses. 
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Example D.0.6 Consider Z4 = {0, 1, 2, 3}. The element 2 has no multiplicative inverse in Z4. Indeed, 
0642=0,1042=2,206042 = 0, and2 643 =2. 


In considering Z, for a general prime, p, we only mention the fact that it is a field. This fact is one 
that students learn in an Abstract Algebra class and is beyond the scope of this text. 


Fact D.0.7 
For any prime number p, (Zp, Bp, Op) is a field. 


Example D.0.8 Let C = {a+ bi | a, b € R}. We call C the set of complex numbers. We define ® 
and © as follows. 


(a+ bi) ®@ (c+ di) = (a+c)+ (b+ d)i and (a + bi) © (c+ di) = (ac — bd) + (ad + be)i 


Theorem D.0.9 
The set of complex numbers C with @ and © defined as above is a field. 


Proof. We will show each of the field properties. Let a, b,c, d, g, f € R. Notice that 


[a+ bi) (c+ dil (f+ gi) =latce)+ 6+ ai] O (f+ gi) = (ateo)+ fo + (b+) + gi. 


Therefore, by the associative property of addition on R, we see that the associative property for addition 
on C also holds. 
Similarly, we see that the commutative property for addition is also inherited from the real numbers. 
To see that the associative property of multiplication holds, we observe the following: 


[(a + bi) © (c + di)] © (g + fi) = [lac — bd) + (ad + boi] © (9 + fi) 
= [g(ac — bd) — f(ad + bc)] + [g(ad + bc) + f (ac — bd) ji 
= (acg — bdg — adf — bcf) + (adg + beg +acf — bdf)i 
= [(a(cg — df) — b(cf + dg)] + [a(cf + dg) + b(cg — df Ii 
= (a+ bi) © l(cg — df) + (cf + dg)i] 
= (a+ bi) Ol(c+di)O (g+ fi. 


To see that the distributive property of multiplication holds, we observe the following: 


522 Appendix D: Fields 


(a+ bi Olect+d) O(f+g)]=@t+b)Olet+ f+d+g)i) 
=la(c+ f)-bid+g)|+lad+g)+b(c+ fli 
= (ac + af — bd —bg)+ (ad+ag+be+bf)i 
= (ac — bd) + (ad + bc)i + (af — bg) + (ag+bf)i 
= (a+ bi) O (c+ di) @ (a+ di) O(f + gi). 


Notice that 0 + Oi € Cand (a+ bi) 6 (0+ 01) = a + bi. Thus, 0 + Oi is the additive identity. Notice 
that —a, —b € R. Thus, —a — bi € C and (—a — bi) @ (a + bi) = 0+4+ Oi. So there exist additive 
inverses in C. Notice also that 1 + Oi € C and 


(1+ 0i) © (a+ bi) =a+t bi. 


Thus, 1+ Oi is the multiplicative identity in C. Finally, notice that if a,b are not both zero and 
a = a* + b* then aa 8; EC. Also, notice that 


= 
+ 
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1 


Thus, 4 — 5; is the multiplicative inverse of a + bi. 
Therefore, (C, ®, ©) is a field. It is common practice to write (C, +, -). 


In no way have we exhausted the study of fields. A more in depth study of fields happens in Abstract 
Algebra courses. 


D.1 Exercises 


1. Show that a € Z»,a # 0 has a multiplicative inverse, thereby proving that (Zp, Bp, Op) 18a field. 

2. The set of complex numbers, C, is a vector space over the field of complex numbers, with the 
standard definitions of addition and multiplication. 

3. Determine whether or not the set of quaternions, Q defined below is a field. 


O = {a+ bi+cj +dk |a,b,c,d € R}. 


Here, i, j, k are defined as the numbers so that i? = j* = k* = ijk = —1 and their products are 
defined by the following multiplication table. 


Addition is defined in the usual sense and multiplication is defined as usual taking into consideration 
the table above. 
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Cola, b], 57 
D(Z2), 62 
F353 
Imxn> 40 
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Closure Property, 42 
Commutative Property, 42 

Adjoint Matrix, 447 

Affine, 188 

Affine Subspace, 470 

Application 
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Diffusion Welding, 4, 54, 79, 136, 204, 330 
Digital Images, 9, 14, 38, 71, 83, 111 
Image Warping, 5, 53, 209 
Radiography/Tomography, 3 
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B 
Bar Graph, 62 
Basic Solution, 322, 484 
Basis, 139 
Change of, 224 
Construction Method, 143, 145 
Eigenspace, 349 
Ordered, 161 
Standard, 142 


Cc 
Cauchy-Schwartz Inequality, 389 
Change of Basis, 224 
Closure 

Addition, 40, 69, 70 

Scalar Multiplication, 40, 69, 70 
Codomain, 287 
Column Space, 307 
Constraint, 484 
Contrapositive, 507 
Converge 

Matrix Sequence, 365 
Coordinate Projection, 405 
Coordinate Space, 482 
Coordinate Vector, 162 


D 

Data Analysis, 2, 482 

Data Classification, 173, 398, 402, 483 

Data Clustering, 398, 483 

Data Compression, 449, 451, 464 

Decision Making, 2, 483 

Determinant, 235 

Diagonal Matrix, 453 

Diagonalizable, 353 
Orthogonally, 439 

Differential Equations, 2 

Digital Images, 84 
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Dimension, 148 
Finite Dimensional, 147 
Infinite Dimensional, 148 
Dimension Reduction, 470 
Direct sum, 77 
Dissipative Transformation, 372 
Distributive Property, 42 
Domain Space, 287 
Dynamical Process, 1, 374, 482 


E 
Echelon Form, 22 
Eigenbasis, 349 
Eigenfunction, 374, 482 
Eigenspace, 334 
Eigenvalue, 333 
Algorithm for, 335 
Eigenvector, 333 
Algorithm for, 335 


F 
Factor Algorithm, 335 
Feature Vectors, 483 
Fourier 
Analysis, 2, 429, 482 
Space, 429, 482 
Transform, 429, 482 
Frobenius Norm, 461 


G 
Gram-Schmidt Theorem, 426 


H 

Heat Signature, 54, 205 

Heat State, 54, 205 
Addition, 207 
Eigenvectors, 446 
Evolution Matrix, 208, 443 
Long-Term Behavior, 359 
Scalar Multiplication, 207 

Hermitian Matrix, 447 

Hessian, 357 

Hessian Matrix, 484 

How To Succeed, 5 

Hubble Space Telescope, 2 

Hyperplane, 402 


I 
Identity 

Additive, 42, 68 

Scalar Multiplication, 42 
Image, 9, 38 

Equal, 15 

Scalar product, 40 

Sum, 39 


Infinite Dimensional Space, 482 
Inner Product, 381 
Cc", 401 
T(Z), 401, 430 
Dot Product, 391 
Frobenius, 391 
Image, 417 
Space, 381 
Inner Product Space, 381 
Inner Products, 482 
Properties of, 388 
Inverse 
Additive, 42, 68 
Left, 273 
Pseudo, 448 
Transformation, 269 
Inverse Matrix, 313 
Invertible, 269, 313 


Invertible Matrix Theorem, 316, 482 


Isomorphic, 264 
Isomorphism, 263 


J 
James Webb Space Telescope, 2 


L 

LCD Digit 
Inner Product, 401, 430 

Least Squares Deviation, 483 

Limit Matrix, 365 

Limit Vector, 366 

Linear 
Combination, 84 
Dependence Relation, 128 
Equation, 15 

Linear Dependence, 126 
Determining, 133 

Linear Equation 
Homogeneous, 61 
Solution, 16 

Linear Independence, 126 
Determining, 133 

Linear Programming, 322, 484 

Linearity Condition, 186 

Linearly Dependent, 126 

Linearly Independent, 126 

Long-Term State, 369 


M 
Machine Learning, 2, 398, 482 
Maps Onto, 259 
Mathematical Induction, 512 
Induction Hypothesis, 512 
Matrix, 58 
Addition, 59 
Adjoint, 447 
Block, 182, 416, 462 


Index 


Index 


Change of Basis, 224 
Characteristic Polynomial of, 344 
Column Space, 307 
Column Stochastic, 373 
Determinant, 235 
Diagonal, 453 
Diagonalizable, 353 
Echelon Form, 22 
Elementary, 30 
Equal, 92 
Frobenius Norm, 461 
Hermitian, 447 
Hessian, 357, 484 
Identity, 29 
Inverse, 313 
Invertible, 313 
Leading entry, 22 
Limit, 365 
Markov, 373 
Normal, 447 
Nullity, 303 
Nullspace, 303 
Orthogonal, 433 
Positive Definite, 381, 387 
Product, 88 
Projection, 419, 420 
Rank, 307, 312 
Reduced Echelon Form, 23 
Representation, 214, 219 
Row Equivalent, 21 
Row Operations, 21 
Row Reduction, 23 
Row Space, 311 
Scalar Multiplication, 59 
Self-Adjoint, 447 
Square, 90 
Stochastic, 373 
Symmetric, 386 
Trace of, 461 
Transpose, 72, 324 
Unitarily Diagonalizable, 447 
Unitary, 447 
Maximal Isomorphism Theorem, 448 
Mitigation for Disease Spread, 2 
Modeling 
Atmospheric Events, 2 
Atomic and Subatomic Interactions, 2 
Celestial Interactions, 1 
Disease Spread, | 
Geologic Events, 2 
Population Dynamics, | 


N 
Necessary and Sufficient, 439, 440, 446 
Norm, 385 
Frobenius, 461 
Properties of, 389 
Nullity, 284, 303 
Nullspace, 282, 303 


O 
Object 
Invisible, 248 
Nonzero, 248 
Zero, 248 
Objective Function, 484 
One-to-One, 251 
Onto, 259 
Optimal Design, 2, 357, 483 
Optimization, 322, 357, 484 
Ordered Basis, 161 
Orthogonal 
Matrix, 433 
Complement, 410 
Projection, 413 
Set, 391 
Vector, 390 


Orthogonal Transformation Theorem, 434 
Orthogonally Diagonalizable, 439 


Orthonormal 
Set, 391 
Outer Product, 463 


P 


Positive Definite Matrix, 387 
Principle Component Analysis, 2, 470, 483 
Principle Components, 452, 483 


Principle Vectors, 470 
Projection, 413 
Matrix, 419, 420 
Coordinate, 405 
Image, 417 
onto Line, 414 
onto Subspace, 415 
Orthogonal, 413 
Subspace, 413, 419, 420 


Properties of Transpose, 324 


Pseudo-Inverse, 448, 467 
Rank s, 467 


Pseudo-Inverse Theorem, 465 


R 
Radiograph 
Distinct, 248 
Identitical, 248 
Nonzero, 248 
Possible, 248 
Zero, 248 
Radiographic Scenario, 178 
Range Space, 287 
Rank, 287, 307, 312 
Rank-Nullity Theorem, 295 


Regression Analysis, 2, 171, 482 


Row Space, 311 


S 
Sample Proof, 503 
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Scalar Multiplication 
Associative Property, 42 
Closure Property, 42 

Separating Subspace, 173, 402 

Sequence, 61 

Sequence Space, 61 

Set 
Equality, 506 
Intersection, 74 
Orthogonal, 391 
Orthonormal, 391 
Subset, 506 
Sum of, 76 
Union, 74 

Singular Value Decomposition, 454, 459, 483 

Singular Values, 456 

Singular Vectors, 456 

Solution Space, 101 
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noun, 112 
Spanning Set, 116 
verb, 114 

Spanning Set, 116 

Spectral Theorem, 440 

Standard Basis, 140, 142 

Statement, 501 
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Hypothesis, 502 
Implications, 502 

Submatrix, 484 

Subset, 65, 68 
Proper, 68 

Subspace, 68 
Affine, 470, 483 
Inherited Property, 68 
Property, 69 

Sum 
Direct, 77 
Set, 76 
Transformation, 192 

Superposition, 104 

Superposition Theorem, 104 

Support Vector Machine, 2, 398, 402, 483 

Symmetric Matrix, 386 

System of Equations, 17 
Equivalent system, 18 
Free Variable, 26 
Homogeneous, 98 
Solution, 17 
Solution Set, 17 
Solving, 18 


T 

Theorem 
Eigenvalues of Symmetric Matrix, 439 
Fundamental Theorem of Algebra, 349 
Gram-Schmidt, 426 
Invertible Matrix, 316, 442 
Mathematical Induction, 512 
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Orthogonal Transformation, 434 
Projection Matrix, 419, 420 
Properties of Transpose, 324 
Pseudo-Inverse, 465 
Rank-Nullity, 295 
Singular Value Decomposition, 454 
Spectral, 440 
Superposition, 104 
Trace, 461 
Transformation, 185 
Affine, 188 
Characteristic Equation, 344 
Characteristic Polynomial of, 344 
Codomain, 185 
Composition, 193 
Coordinate, 191 
Diagonalizable, 353 
Difference, 193 
Dissipative, 372 
Domain, 185 
Equal, 196 
Identity, 191 
Image, 255 
Inverse of, 269 
Invertible, 269 
Isomorphic, 264 
Left Inverse, 273 
Linear, 186 
Linearity Condition, 186 
Nullity of, 284 
Nullspace of, 282 
One-to-one, 251 
Onto, 259 
Properties Summary, 263 
Range, 185, 287 
Rank of, 287 
Sum, 192 
Zero, 190 
Transpose, 72 
Triangle Inequality, 389 
Trigonometric Identities 
Angle Sum, 446 
Triple Angle, 446 
Trivial Solution, 99, 110 
Tsunami Prediction, 2 


U 
Unitary Matrix, 447 


Vv 

Vector, 42 
Coordinate, 162 
Limit, 366 
Norm, 385 
Orthogonal, 390 
Unit, 385 

Vector Space, 42 
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